Overview
TensorOne API implements rate limiting to ensure fair usage and maintain service quality for all users. Rate limits vary by endpoint type, subscription plan, and API key permissions.
Rate Limit Tiers
Free Tier
- Read Operations: 100 requests/hour
- Write Operations: 20 requests/hour
- AI Services: 50 requests/hour
- Training Jobs: 5 jobs/day
Pro Tier
- Read Operations: 1,000 requests/hour
- Write Operations: 200 requests/hour
- AI Services: 500 requests/hour
- Training Jobs: 20 jobs/day
Enterprise Tier
- Read Operations: 10,000 requests/hour
- Write Operations: 2,000 requests/hour
- AI Services: 5,000 requests/hour
- Training Jobs: Unlimited
Every API response includes rate limit information:
HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1640998800
X-RateLimit-Window: 3600
Retry-After: 60
X-RateLimit-Limit: Maximum requests allowed in the time window
X-RateLimit-Remaining: Requests remaining in current window
X-RateLimit-Reset: Unix timestamp when the rate limit resets
X-RateLimit-Window: Rate limit window in seconds
Retry-After: Seconds to wait before making another request (when rate limited)
Endpoint-Specific Limits
Account Management
GET /accounts/* - 500/hour
POST /accounts/* - 50/hour
PUT /accounts/* - 50/hour
DELETE /accounts/* - 10/hour
GPU Clusters
GET /clusters/* - 1000/hour
POST /clusters - 20/hour (creation)
PUT /clusters/* - 100/hour
DELETE /clusters/* - 20/hour
Serverless Endpoints
GET /endpoints/* - 1000/hour
POST /endpoints - 50/hour (creation)
POST /endpoints/*/execute - 500/hour (execution)
AI Services
POST /ai/chat - 200/hour
POST /ai/text-to-image - 100/hour
POST /ai/text-to-video - 20/hour
POST /ai/text-to-speech - 150/hour
Training Jobs
GET /training/* - 500/hour
POST /training/jobs - 10/hour
PUT /training/* - 50/hour
Rate Limit Strategies
1. Exponential Backoff
Implement exponential backoff when rate limited:
import time
import random
def make_request_with_backoff(func, max_retries=5):
for attempt in range(max_retries):
try:
response = func()
if response.status_code != 429:
return response
except Exception as e:
if attempt == max_retries - 1:
raise e
# Exponential backoff with jitter
delay = (2 ** attempt) + random.uniform(0, 1)
time.sleep(delay)
raise Exception("Max retries exceeded")
2. Request Batching
Batch multiple operations when possible:
# Instead of multiple single requests
curl -X POST "https://api.tensorone.ai/v2/endpoints/ep1/execute" -d '...'
curl -X POST "https://api.tensorone.ai/v2/endpoints/ep2/execute" -d '...'
# Use batch execution
curl -X POST "https://api.tensorone.ai/v2/endpoints/batch-execute" \
-d '{
"requests": [
{"endpoint_id": "ep1", "input": {...}},
{"endpoint_id": "ep2", "input": {...}}
]
}'
3. Caching
Cache responses when appropriate:
import requests
from functools import lru_cache
import time
@lru_cache(maxsize=128)
def get_cluster_info(cluster_id, cache_time):
# cache_time parameter ensures cache expires
response = requests.get(f"https://api.tensorone.ai/v2/clusters/{cluster_id}")
return response.json()
# Use with current hour to cache for 1 hour
current_hour = int(time.time() // 3600)
cluster_info = get_cluster_info("cluster_123", current_hour)
Monitoring Rate Limits
Check Current Usage
curl -X GET "https://api.tensorone.ai/v2/auth/rate-limits" \
-H "Authorization: Bearer YOUR_API_KEY"
Response:
{
"limits": {
"read_operations": {
"limit": 1000,
"remaining": 847,
"reset_at": "2024-01-15T11:00:00Z"
},
"write_operations": {
"limit": 200,
"remaining": 195,
"reset_at": "2024-01-15T11:00:00Z"
},
"ai_services": {
"limit": 500,
"remaining": 423,
"reset_at": "2024-01-15T11:00:00Z"
}
}
}
Usage Analytics
curl -X GET "https://api.tensorone.ai/v2/analytics/api-usage?period=24h" \
-H "Authorization: Bearer YOUR_API_KEY"
Rate Limit Errors
429 Too Many Requests
{
"error": "RATE_LIMIT_EXCEEDED",
"message": "Rate limit exceeded for endpoint",
"code": 429,
"details": {
"limit": 100,
"remaining": 0,
"reset_at": "2024-01-15T11:00:00Z",
"retry_after": 1800
}
}
Handling in Code
async function makeAPIRequest(url, options) {
try {
const response = await fetch(url, options);
if (response.status === 429) {
const retryAfter = response.headers.get("Retry-After");
console.log(`Rate limited. Retry after ${retryAfter} seconds`);
// Wait and retry
await new Promise((resolve) => setTimeout(resolve, retryAfter * 1000));
return makeAPIRequest(url, options);
}
return response;
} catch (error) {
console.error("API request failed:", error);
throw error;
}
}
Optimization Tips
1. Use Appropriate HTTP Methods
- Use
HEAD requests to check resource existence
- Use
PATCH instead of PUT for partial updates
- Implement conditional requests with
If-Modified-Since
2. Optimize Polling
# Instead of constant polling
while True:
status = get_job_status(job_id)
if status == 'completed':
break
time.sleep(5) # Wastes rate limit
# Use progressive intervals
def wait_for_completion(job_id):
intervals = [5, 10, 30, 60] # Progressive backoff
interval_index = 0
while True:
status = get_job_status(job_id)
if status == 'completed':
return
interval = intervals[min(interval_index, len(intervals) - 1)]
time.sleep(interval)
interval_index += 1
3. Request Prioritization
Use priority headers for critical requests:
curl -X POST "https://api.tensorone.ai/v2/endpoints/critical/execute" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "X-Priority: high" \
-d '...'
Increasing Rate Limits
Upgrade Your Plan
Higher tier plans come with increased rate limits:
- Pro Plan: 5x increase across all endpoints
- Enterprise Plan: 50x increase with custom limits available
Request Limit Increase
For specific use cases, contact support with:
- Expected request volume
- Use case description
- Timeline requirements
- Current plan tier
Temporary Limit Boosts
For events or migrations:
curl -X POST "https://api.tensorone.ai/v2/auth/temporary-boost" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"multiplier": 2,
"duration": "24h",
"reason": "Data migration"
}'
Consider using webhooks instead of polling to reduce API calls and stay within rate limits.