The four caps
| Cap | Default | Set where |
|---|---|---|
rate_limit_5h (USD spend, rolling 5 hours) | unlimited | Per key, in Console → API keys |
rate_limit_1d (USD spend, rolling 24 hours) | unlimited | Per key |
rate_limit_7d (USD spend, rolling 7 days) | unlimited | Per key |
| Concurrency (in-flight requests) | tier-defined | Per subscription tier; see pricing |
0 on any of the spend caps = unlimited.
How they interact
On every request the gateway computes:Remaining hits 0, the next request returns 429:
/v1/messages; OpenAI shape for /chat/completions + /responses.)
Picking values
| Use case | Recommended caps |
|---|---|
| Dev / local laptop | rate_limit_5h = 5, _1d = 20, _7d = 50 — caps an off-screen runaway loop without blocking ordinary work |
| Production API key | _5h = 100, _1d = 500, _7d = 2000 — sized to ~10× expected usage. Lets you absorb traffic spikes but contains runaway bugs |
| Per-customer key (multi-tenant) | All three set to that customer’s allowance — issue one key per customer with their billing window’s cap |
| Long-running batch job | _5h high (let the batch burn), _1d and _7d lower (prevent a runaway batch from looping for days) |
quota (lifetime cap) and expires_in_days are separate from the
rate-limit buckets — they don’t interact. See
Authentication.
Concurrency
Concurrency is the count of in-flight requests against your account at any one moment (across all your keys). It’s set per subscription tier:| Tier | Concurrency cap |
|---|---|
| Free | 5 |
| Pro | 25 |
| Max | 100 |
| Enterprise | Custom (typically 500–2000) |
429 immediately with
type: "rate_limit_error", code: "concurrency_limit". The
recommended response is the same as a normal 429 — backoff + retry.
If you’re hitting concurrency on Free / Pro and the spend caps are
nowhere near, upgrade tier rather than spawning more keys. The cap
is account-level, not key-level.
Backoff strategy
The gateway’s reset timestamps are precise — use them rather than exponential backoff guesswork:X-RateLimit-Reset), use a short jittered
backoff (e.g. 1 + random()*2 seconds) — the cap clears as
in-flight requests finish, which can be sub-second.
What’s not rate-limited
GET /api/v1/me/*management calls — free, never throttledGET /api/v1/me/usage— freeGET /api/v1/me/account— freePOST /v1/tasks/query— free, doesn’t count against concurrencyPOST /v1/tasks/cancel— free, doesn’t count against concurrency- The dial-test in console — uses cookie auth, not a key, never billed
/v1/messages, /v1/chat/completions, /v1/responses,
/v1beta/..., /v1/images/*, /v1/tasks/submit counts.
Reading the usage log
To debug a 429 — see what’s been spending:credits column over the relevant window. The tightest
bucket from the response headers tells you which window to look at.
Raising the limits
| You want | Action |
|---|---|
| Higher per-key spend cap | Edit the key in Console → API keys |
| Higher account concurrency | Upgrade tier in Console → Subscriptions |
| Custom limits beyond Max tier | Email enterprise@bytespike.ai |
Related
- Authentication — per-key control list
- Credits & billing — how spend rolls up
- Error handling — error envelope + retry semantics