Skip to main content
ByteSpike bills in credits where 1 USD = 1,000,000 credits (micro-USD precision). Per-token / per-call rates are quoted in dollars in the pricing table and refreshed nightly from the production gateway.

What costs what

SurfacePricing model
Text endpoints (/v1/messages, /v1/chat/completions, /v1/responses, /v1beta/.../generateContent)Per 1M input + per 1M output tokens. Cache reads discounted, cache writes at 1.0× (most models) or 1.25× (Claude).
Image endpoints (/v1/images/generations, /v1/tasks/submit for batched/async)Per image. Discrete per-call cost.
Video endpoints (/v1/tasks/submit for Sora / Veo / Seedance)Per second of output. Failed renders are free.
Utility (/v1/models, /v1/balance, /v1/usage, /v1/tasks/{query,cancel})Free.
The authoritative live rate is always bytespike.ai/pricing.

Failures don’t bill

Any non-2xx response is free, regardless of which model failed or how far into the request the failure occurred. This is a hard contract — the X-Quota-Remaining-Credits header doesn’t move on a non-2xx. The narrow exception: if you cancel a video task after rendering has started (status running), the partial seconds rendered may bill depending on the model’s own refund policy. The credits_used field on the cancel response is authoritative. See tasks/cancel.

Accounting headers

Every response carries the quota envelope (success and failure both):
HeaderWhat it means
X-RateLimit-LimitUSD cap of the rate-limit bucket closest to constraining you (the tightest of rate_limit_5h / _1d / _7d).
X-RateLimit-RemainingBudget left in that bucket.
X-RateLimit-ResetUnix timestamp when the bucket resets.
X-Quota-Remaining-CreditsLifetime credits remaining on this key (USD). 0.00 = key’s quota cap reached.
X-Org-Quota-Remaining-CreditsOrg wallet remaining, on org-owned keys.
The actual cost of an individual request isn’t in the response headers — it’s available via GET /api/v1/usage, which returns one row per call with prompt_tokens, completion_tokens, and the final billed credits.

Pre-flight budgeting

For “this will cost ~$X, confirm?” flows:
  1. Compute a worst-case from max_tokens × output rate + prompt size × input rate using the pricing table.
  2. Compare against X-Quota-Remaining-Credits from a prior call (or hit /v1/balance — free).
  3. If the estimate exceeds your budget, don’t send the request.
  4. After the request, reconcile actual cost via /api/v1/usage.
ByteSpike does not currently emit a pre-flight estimate header — the quota headers are post-billing only.

Quota cliffs

A key stops serving requests when any of the following is true:
  • X-Quota-Remaining-Credits = 0 (key’s quota cap reached)
  • X-RateLimit-Remaining = 0 (key’s tightest rate-limit bucket exhausted)
  • Org wallet is empty (org-owned keys)
The gateway returns 402 insufficient_balance (OpenAI envelope) or permission_error (Anthropic envelope). To raise the cap, edit the key in Console → API keys or top up the org wallet.

Top-up and subscriptions

Top up the org wallet at Console → Billing. The minimum top-up is $5; bonuses scale up to +11.2% on larger packs. Subscription tiers (Pro / Max / Enterprise) bundle credits with higher concurrency and priority. See bytespike.ai/pricing for the live tier table.