1 USD = 1,000,000 credits
(micro-USD precision). Per-token / per-call rates are quoted in dollars
in the pricing table and refreshed
nightly from the production gateway.
What costs what
| Surface | Pricing model |
|---|---|
Text endpoints (/v1/messages, /v1/chat/completions, /v1/responses, /v1beta/.../generateContent) | Per 1M input + per 1M output tokens. Cache reads discounted, cache writes at 1.0× (most models) or 1.25× (Claude). |
Image endpoints (/v1/images/generations, /v1/tasks/submit for batched/async) | Per image. Discrete per-call cost. |
Video endpoints (/v1/tasks/submit for Sora / Veo / Seedance) | Per second of output. Failed renders are free. |
Utility (/v1/models, /v1/balance, /v1/usage, /v1/tasks/{query,cancel}) | Free. |
Failures don’t bill
Any non-2xx response is free, regardless of which model failed or how far into the request the failure occurred. This is a hard contract — theX-Quota-Remaining-Credits header doesn’t move on a non-2xx.
The narrow exception: if you cancel a video task after rendering
has started (status running), the partial seconds
rendered may bill depending on the model’s own refund policy. The
credits_used field on the cancel response is authoritative. See
tasks/cancel.
Accounting headers
Every response carries the quota envelope (success and failure both):| Header | What it means |
|---|---|
X-RateLimit-Limit | USD cap of the rate-limit bucket closest to constraining you (the tightest of rate_limit_5h / _1d / _7d). |
X-RateLimit-Remaining | Budget left in that bucket. |
X-RateLimit-Reset | Unix timestamp when the bucket resets. |
X-Quota-Remaining-Credits | Lifetime credits remaining on this key (USD). 0.00 = key’s quota cap reached. |
X-Org-Quota-Remaining-Credits | Org wallet remaining, on org-owned keys. |
GET /api/v1/usage,
which returns one row per call with prompt_tokens, completion_tokens,
and the final billed credits.
Pre-flight budgeting
For “this will cost ~$X, confirm?” flows:- Compute a worst-case from
max_tokens× output rate + prompt size × input rate using the pricing table. - Compare against
X-Quota-Remaining-Creditsfrom a prior call (or hit/v1/balance— free). - If the estimate exceeds your budget, don’t send the request.
- After the request, reconcile actual cost via
/api/v1/usage.
Quota cliffs
A key stops serving requests when any of the following is true:X-Quota-Remaining-Credits = 0(key’squotacap reached)X-RateLimit-Remaining = 0(key’s tightest rate-limit bucket exhausted)- Org wallet is empty (org-owned keys)
402 insufficient_balance (OpenAI envelope) or
permission_error (Anthropic envelope). To raise the cap, edit the
key in Console → API keys or
top up the org wallet.