ByteSpike publishes one per-model rate per surface. No tiers, no markup
paragraphs. Failed requests don’t bill.
1 USD = 1,000,000 credits (micro-USD precision). Token rates are
quoted per million tokens. Per-call rates are quoted in dollars.
How rates work
Every row in this document is the public ByteSpike rate for that model,
refreshed nightly. If you need programmatic access to the same data, hit
GET /api/pricing directly (returns a JSON array matching the table rows
below).
Cache pricing convention:
- Cache write: most models bill at the input rate; Claude models bill at 1.25× input.
- Cache read: a separate, lower per-model rate — it varies by model, so see each row (it is not a flat 10% across the board).
- Web search tool (where supported): per-1k-uses surcharge, billed separately.
For subscription / top-up pricing, see bytespike.ai/pricing.
Text models (25)
OpenAI (7)
| Model | Input / 1M | Cache write | Cache read | Output / 1M |
|---|
gpt-5-5 | $5.00 | $5.00 | $0.50 | $30.00 |
gpt-5-5-instant | $5.00 | $5.00 | $0.50 | $30.00 |
gpt-5-4-pro | $30.00 | $30.00 | $3.00 | $180.00 |
gpt-5-4 | $2.50 | $2.50 | $0.25 | $15.00 |
gpt-5-4-mini | $0.75 | $0.75 | $0.075 | $4.50 |
gpt-5-4-nano | $0.20 | $0.20 | $0.02 | $1.25 |
gpt-5-2 | $1.75 | $1.75 | $0.175 | $14.00 |
GPT-5.5 web-search tool: $10 / 1k uses (billed separately).
Anthropic (7)
| Model | Input / 1M | Cache write (1.25×) | Cache read | Output / 1M |
|---|
claude-opus-4-8 | $5.00 | $6.25 | $0.50 | $25.00 |
claude-opus-4-7 | $5.00 | $6.25 | $0.50 | $25.00 |
claude-opus-4-6 | $5.00 | $6.25 | $0.50 | $25.00 |
claude-opus-4-5 | $5.00 | $6.25 | $0.50 | $25.00 |
claude-sonnet-4-6 | $3.00 | $3.75 | $0.30 | $15.00 |
claude-sonnet-4-5 | $3.00 | $3.75 | $0.30 | $15.00 |
claude-haiku-4-5 | $1.00 | $1.25 | $0.10 | $5.00 |
Claude Opus 4.8 web-search tool: $10 / 1k uses (billed separately).
Google (5)
| Model | Input / 1M | Cache write | Cache read | Output / 1M |
|---|
gemini-3-1-pro | $2.00 | $2.00 | $0.20 | $12.00 |
gemini-3-5-flash | $1.50 | $1.50 | $0.15 | $9.00 |
gemini-3-flash | $0.50 | $0.50 | $0.05 | $3.00 |
gemini-3-flash-lite | $0.25 | $0.25 | $0.025 | $1.50 |
gemini-2-5-flash | $0.50 | $0.50 | $0.05 | $3.00 |
National LLMs (6)
DeepSeek (3)
| Model | Input / 1M | Cache read | Output / 1M |
|---|
deepseek-v3-2 | $0.14 | $0.003 | $0.28 |
deepseek-v4-flash | $0.14 | $0.003 | $0.28 |
deepseek-v4-pro | $0.435 | $0.004 | $0.87 |
Moonshot (1)
| Model | Input / 1M | Cache read | Output / 1M |
|---|
kimi-k2-6 | $0.95 | $0.16 | $4.00 |
Zhipu (1)
| Model | Input / 1M | Cache read | Output / 1M |
|---|
glm-5-1 | $1.40 | $0.26 | $4.40 |
MiniMax (1)
| Model | Input / 1M | Cache read | Output / 1M |
|---|
minimax-m2-7 | $0.26 | $0.06 | $1.20 |
Image models (6)
Sync endpoints. Pricing is per generated image. n>1 bills each image
separately.
OpenAI (1)
| Model | Per image |
|---|
gpt-image-2 | $0.08 |
Google (2)
| Model | Per image |
|---|
nano-banana | $0.018 |
nano-banana-v2 | $0.022 |
ByteDance (3)
| Model | Per image |
|---|
seedream-4 | $0.025 |
seedream-4-5 | $0.030 |
seedream-v5lite | $0.012 |
Any image model not listed above falls back to the **default per-image
rate of 0.134∗∗.Forreference,thehigher−tierGeminiimagemodelsbillat‘gemini−3−pro−image‘0.134 and gemini-3.1-flash-image $0.335 per
image.
Video models (9)
Async via /tasks/submit → /tasks/query. Pricing is purely per-second
of output — there is no submit fee. Cancellations during queued are
free; cancellations after running partial-bill the seconds rendered.
OpenAI (2)
| Model | Resolution | Per second |
|---|
sora2 | 1080p | $0.10 |
sora2-pro | 1080p | $0.30 |
Google (2)
| Model | Resolution | Per second |
|---|
veo3-1 | 1080p | $0.40 |
veo3-1-fast | 720p | $0.20 |
ByteDance (5)
| Model | Resolution | Per second |
|---|
seedance-1-5-pro | 1080p | $0.05 |
seedance-pro | 1080p | $0.06 |
seedance-pro-fast | 1080p | $0.04 |
seedance2 | 1080p | $0.08 |
seedance2-fast | 720p | $0.05 |
Utility endpoints
| Endpoint | Cost |
|---|
GET /balance | Free |
POST /tasks/submit | Cost = underlying model rate × duration |
GET /tasks/query | Free |
POST /tasks/cancel | Free if status=queued; partial-billed if running |
Pricing notes
- Failures don’t bill. Any non-2xx response is free. The narrow
exception: video tasks cancelled after
running — partial GPU
seconds are charged.
- Cache write rate: most models bill cache writes at the input
rate. Claude models bill cache writes at 1.25× input.
- Cache read rate: a separate, lower per-model rate — it varies by model, so see each row (it is not a flat 10% across the board).
- Web search / grounding tools (where supported): per-1k-uses
surcharge, billed separately from token usage.
- Image / video task failures: 100% refund at the task level.
- Chat 5xx: not billed; auto-retried at the gateway envelope.
Programmatic access
curl https://llm.bytespike.ai/api/pricing \
-H "x-api-key: $BYTESPIKE_API_KEY"
Returns a JSON array with one entry per model:
{
"model": "gpt-5-5",
"category": "text",
"vendor": "openai",
"rates": {
"input_per_1m": 5.00,
"cache_write_per_1m": 5.00,
"cache_read_per_1m": 0.50,
"output_per_1m": 30.00,
"currency": "USD"
},
"updated_at": "2026-05-08T04:30:00Z"
}
Refresh cadence is daily at 04:30 UTC. Cache the response client-side
for at least 24 hours; the updated_at field tells you when the rate
last refreshed.
See also