Skip to main content
ByteSpike enforces three rolling-window spend caps per API key, plus a concurrency cap. All four are independent — the tightest one wins.

The four caps

CapDefaultSet where
rate_limit_5h (USD spend, rolling 5 hours)unlimitedPer key, in Console → API keys
rate_limit_1d (USD spend, rolling 24 hours)unlimitedPer key
rate_limit_7d (USD spend, rolling 7 days)unlimitedPer key
Concurrency (in-flight requests)tier-definedPer subscription tier; see pricing
0 on any of the spend caps = unlimited.

How they interact

On every request the gateway computes:
remaining_5h  = rate_limit_5h - spend_in_last_5h
remaining_1d  = rate_limit_1d - spend_in_last_1d
remaining_7d  = rate_limit_7d - spend_in_last_7d
The response carries the tightest of the three:
X-RateLimit-Limit: 50.00            # the limit closest to constraining you
X-RateLimit-Remaining: 4.18         # remaining in that bucket
X-RateLimit-Reset: 1716705600       # Unix ts when that bucket resets
When Remaining hits 0, the next request returns 429:
{
  "error": {
    "type": "rate_limit_error",
    "message": "rate_limit_5h exceeded: 50.00 / 50.00 used; resets at 2026-05-25 14:00:00 UTC"
  }
}
(Anthropic shape for /v1/messages; OpenAI shape for /chat/completions + /responses.)

Picking values

Use caseRecommended caps
Dev / local laptoprate_limit_5h = 5, _1d = 20, _7d = 50 — caps an off-screen runaway loop without blocking ordinary work
Production API key_5h = 100, _1d = 500, _7d = 2000 — sized to ~10× expected usage. Lets you absorb traffic spikes but contains runaway bugs
Per-customer key (multi-tenant)All three set to that customer’s allowance — issue one key per customer with their billing window’s cap
Long-running batch job_5h high (let the batch burn), _1d and _7d lower (prevent a runaway batch from looping for days)
quota (lifetime cap) and expires_in_days are separate from the rate-limit buckets — they don’t interact. See Authentication.

Concurrency

Concurrency is the count of in-flight requests against your account at any one moment (across all your keys). It’s set per subscription tier:
TierConcurrency cap
Free5
Pro25
Max100
EnterpriseCustom (typically 500–2000)
When you hit the cap, new requests return 429 immediately with type: "rate_limit_error", code: "concurrency_limit". The recommended response is the same as a normal 429 — backoff + retry. If you’re hitting concurrency on Free / Pro and the spend caps are nowhere near, upgrade tier rather than spawning more keys. The cap is account-level, not key-level.

Backoff strategy

The gateway’s reset timestamps are precise — use them rather than exponential backoff guesswork:
import time, requests

def call_with_backoff(payload):
    while True:
        r = requests.post(URL, json=payload, headers=HEADERS)
        if r.status_code != 429:
            return r
        reset = int(r.headers.get("X-RateLimit-Reset", 0))
        wait = max(1, reset - int(time.time()))
        time.sleep(min(wait, 300))   # cap at 5 min so a stuck reset can't deadlock us
For concurrency 429s (no X-RateLimit-Reset), use a short jittered backoff (e.g. 1 + random()*2 seconds) — the cap clears as in-flight requests finish, which can be sub-second.

What’s not rate-limited

  • GET /api/v1/me/* management calls — free, never throttled
  • GET /api/v1/me/usage — free
  • GET /api/v1/me/account — free
  • POST /v1/tasks/query — free, doesn’t count against concurrency
  • POST /v1/tasks/cancel — free, doesn’t count against concurrency
  • The dial-test in console — uses cookie auth, not a key, never billed
Anything against /v1/messages, /v1/chat/completions, /v1/responses, /v1beta/..., /v1/images/*, /v1/tasks/submit counts.

Reading the usage log

To debug a 429 — see what’s been spending:
curl 'https://llm.bytespike.ai/api/v1/me/usage?limit=100&api_key_id=42' \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY"
Sum the credits column over the relevant window. The tightest bucket from the response headers tells you which window to look at.

Raising the limits

You wantAction
Higher per-key spend capEdit the key in Console → API keys
Higher account concurrencyUpgrade tier in Console → Subscriptions
Custom limits beyond Max tierEmail enterprise@bytespike.ai