Skip to main content
ByteSpike publishes one per-model rate per surface. No tiers, no markup paragraphs. Failed requests don’t bill.
1 USD = 1,000,000 credits (micro-USD precision). Token rates are quoted per million tokens. Per-call rates are quoted in dollars.

How rates work

Every row in this document is the public ByteSpike rate for that model, refreshed nightly. If you need programmatic access to the same data, hit GET /api/pricing directly (returns a JSON array matching the table rows below). Cache pricing convention:
  • Cache write: most models bill at the input rate; Claude models bill at 1.25× input.
  • Cache read: a separate, lower per-model rate — it varies by model, so see each row (it is not a flat 10% across the board).
  • Web search tool (where supported): per-1k-uses surcharge, billed separately.
For subscription / top-up pricing, see bytespike.ai/pricing.

Text models (25)

OpenAI (7)

ModelInput / 1MCache writeCache readOutput / 1M
gpt-5-5$5.00$5.00$0.50$30.00
gpt-5-5-instant$5.00$5.00$0.50$30.00
gpt-5-4-pro$30.00$30.00$3.00$180.00
gpt-5-4$2.50$2.50$0.25$15.00
gpt-5-4-mini$0.75$0.75$0.075$4.50
gpt-5-4-nano$0.20$0.20$0.02$1.25
gpt-5-2$1.75$1.75$0.175$14.00
GPT-5.5 web-search tool: $10 / 1k uses (billed separately).

Anthropic (7)

ModelInput / 1MCache write (1.25×)Cache readOutput / 1M
claude-opus-4-8$5.00$6.25$0.50$25.00
claude-opus-4-7$5.00$6.25$0.50$25.00
claude-opus-4-6$5.00$6.25$0.50$25.00
claude-opus-4-5$5.00$6.25$0.50$25.00
claude-sonnet-4-6$3.00$3.75$0.30$15.00
claude-sonnet-4-5$3.00$3.75$0.30$15.00
claude-haiku-4-5$1.00$1.25$0.10$5.00
Claude Opus 4.8 web-search tool: $10 / 1k uses (billed separately).

Google (5)

ModelInput / 1MCache writeCache readOutput / 1M
gemini-3-1-pro$2.00$2.00$0.20$12.00
gemini-3-5-flash$1.50$1.50$0.15$9.00
gemini-3-flash$0.50$0.50$0.05$3.00
gemini-3-flash-lite$0.25$0.25$0.025$1.50
gemini-2-5-flash$0.50$0.50$0.05$3.00

National LLMs (6)

DeepSeek (3)

ModelInput / 1MCache readOutput / 1M
deepseek-v3-2$0.14$0.003$0.28
deepseek-v4-flash$0.14$0.003$0.28
deepseek-v4-pro$0.435$0.004$0.87

Moonshot (1)

ModelInput / 1MCache readOutput / 1M
kimi-k2-6$0.95$0.16$4.00

Zhipu (1)

ModelInput / 1MCache readOutput / 1M
glm-5-1$1.40$0.26$4.40

MiniMax (1)

ModelInput / 1MCache readOutput / 1M
minimax-m2-7$0.26$0.06$1.20

Image models (6)

Sync endpoints. Pricing is per generated image. n>1 bills each image separately.

OpenAI (1)

ModelPer image
gpt-image-2$0.08

Google (2)

ModelPer image
nano-banana$0.018
nano-banana-v2$0.022

ByteDance (3)

ModelPer image
seedream-4$0.025
seedream-4-5$0.030
seedream-v5lite$0.012
Any image model not listed above falls back to the **default per-image rate of 0.134.Forreference,thehighertierGeminiimagemodelsbillatgemini3proimage0.134**. For reference, the higher-tier Gemini image models bill at `gemini-3-pro-image` 0.134 and gemini-3.1-flash-image $0.335 per image.

Video models (9)

Async via /tasks/submit/tasks/query. Pricing is purely per-second of output — there is no submit fee. Cancellations during queued are free; cancellations after running partial-bill the seconds rendered.

OpenAI (2)

ModelResolutionPer second
sora21080p$0.10
sora2-pro1080p$0.30

Google (2)

ModelResolutionPer second
veo3-11080p$0.40
veo3-1-fast720p$0.20

ByteDance (5)

ModelResolutionPer second
seedance-1-5-pro1080p$0.05
seedance-pro1080p$0.06
seedance-pro-fast1080p$0.04
seedance21080p$0.08
seedance2-fast720p$0.05

Utility endpoints

EndpointCost
GET /balanceFree
POST /tasks/submitCost = underlying model rate × duration
GET /tasks/queryFree
POST /tasks/cancelFree if status=queued; partial-billed if running

Pricing notes

  • Failures don’t bill. Any non-2xx response is free. The narrow exception: video tasks cancelled after running — partial GPU seconds are charged.
  • Cache write rate: most models bill cache writes at the input rate. Claude models bill cache writes at 1.25× input.
  • Cache read rate: a separate, lower per-model rate — it varies by model, so see each row (it is not a flat 10% across the board).
  • Web search / grounding tools (where supported): per-1k-uses surcharge, billed separately from token usage.
  • Image / video task failures: 100% refund at the task level.
  • Chat 5xx: not billed; auto-retried at the gateway envelope.

Programmatic access

curl https://llm.bytespike.ai/api/pricing \
  -H "x-api-key: $BYTESPIKE_API_KEY"
Returns a JSON array with one entry per model:
{
  "model": "gpt-5-5",
  "category": "text",
  "vendor": "openai",
  "rates": {
    "input_per_1m": 5.00,
    "cache_write_per_1m": 5.00,
    "cache_read_per_1m": 0.50,
    "output_per_1m": 30.00,
    "currency": "USD"
  },
  "updated_at": "2026-05-08T04:30:00Z"
}
Refresh cadence is daily at 04:30 UTC. Cache the response client-side for at least 24 hours; the updated_at field tells you when the rate last refreshed.

See also