Pricing - ByteSpike

ByteSpike publishes one per-model rate per surface. No tiers, no markup paragraphs. Failed requests don’t bill.

1 USD = 1,000,000 credits (micro-USD precision). Token rates are quoted per million tokens. Per-call rates are quoted in dollars.

How rates work

Every row in this document is the public ByteSpike rate for that model, refreshed nightly. If you need programmatic access to the same data, hit GET /api/pricing directly (returns a JSON array matching the table rows below). Cache pricing convention:

Cache write: most models bill at the input rate; Claude models bill at 1.25× input.
Cache read: a separate, lower per-model rate — it varies by model, so see each row (it is not a flat 10% across the board).
Web search tool (where supported): per-1k-uses surcharge, billed separately.

For subscription / top-up pricing, see bytespike.ai/pricing.

Text models (25)

OpenAI (7)

Model	Input / 1M	Cache write	Cache read	Output / 1M
`gpt-5-5`	$5.00	$5.00	$0.50	$30.00
`gpt-5-5-instant`	$5.00	$5.00	$0.50	$30.00
`gpt-5-4-pro`	$30.00	$30.00	$3.00	$180.00
`gpt-5-4`	$2.50	$2.50	$0.25	$15.00
`gpt-5-4-mini`	$0.75	$0.75	$0.075	$4.50
`gpt-5-4-nano`	$0.20	$0.20	$0.02	$1.25
`gpt-5-2`	$1.75	$1.75	$0.175	$14.00

GPT-5.5 web-search tool: $10 / 1k uses (billed separately).

Anthropic (7)

Model	Input / 1M	Cache write (1.25×)	Cache read	Output / 1M
`claude-opus-4-8`	$5.00	$6.25	$0.50	$25.00
`claude-opus-4-7`	$5.00	$6.25	$0.50	$25.00
`claude-opus-4-6`	$5.00	$6.25	$0.50	$25.00
`claude-opus-4-5`	$5.00	$6.25	$0.50	$25.00
`claude-sonnet-4-6`	$3.00	$3.75	$0.30	$15.00
`claude-sonnet-4-5`	$3.00	$3.75	$0.30	$15.00
`claude-haiku-4-5`	$1.00	$1.25	$0.10	$5.00

Claude Opus 4.8 web-search tool: $10 / 1k uses (billed separately).

Google (5)

Model	Input / 1M	Cache write	Cache read	Output / 1M
`gemini-3-1-pro`	$2.00	$2.00	$0.20	$12.00
`gemini-3-5-flash`	$1.50	$1.50	$0.15	$9.00
`gemini-3-flash`	$0.50	$0.50	$0.05	$3.00
`gemini-3-flash-lite`	$0.25	$0.25	$0.025	$1.50
`gemini-2-5-flash`	$0.50	$0.50	$0.05	$3.00

National LLMs (6)

DeepSeek (3)

Model	Input / 1M	Cache read	Output / 1M
`deepseek-v3-2`	$0.14	$0.003	$0.28
`deepseek-v4-flash`	$0.14	$0.003	$0.28
`deepseek-v4-pro`	$0.435	$0.004	$0.87

Moonshot (1)

Model	Input / 1M	Cache read	Output / 1M
`kimi-k2-6`	$0.95	$0.16	$4.00

Zhipu (1)

Model	Input / 1M	Cache read	Output / 1M
`glm-5-1`	$1.40	$0.26	$4.40

MiniMax (1)

Model	Input / 1M	Cache read	Output / 1M
`minimax-m2-7`	$0.26	$0.06	$1.20

Image models (6)

Sync endpoints. Pricing is per generated image. n>1 bills each image separately.

OpenAI (1)

Model	Per image
`gpt-image-2`	$0.08

Google (2)

Model	Per image
`nano-banana`	$0.018
`nano-banana-v2`	$0.022

ByteDance (3)

Model	Per image
`seedream-4`	$0.025
`seedream-4-5`	$0.030
`seedream-v5lite`	$0.012

Any image model not listed above falls back to the **default per-image rate of

0.134**. For reference, the higher-tier Gemini image models bill at `gemini-3-pro-image`

0.134 and gemini-3.1-flash-image $0.335 per image.

Video models (9)

Async via /tasks/submit → /tasks/query. Pricing is purely per-second of output — there is no submit fee. Cancellations during queued are free; cancellations after running partial-bill the seconds rendered.

OpenAI (2)

Model	Resolution	Per second
`sora2`	1080p	$0.10
`sora2-pro`	1080p	$0.30

Google (2)

Model	Resolution	Per second
`veo3-1`	1080p	$0.40
`veo3-1-fast`	720p	$0.20

ByteDance (5)

Model	Resolution	Per second
`seedance-1-5-pro`	1080p	$0.05
`seedance-pro`	1080p	$0.06
`seedance-pro-fast`	1080p	$0.04
`seedance2`	1080p	$0.08
`seedance2-fast`	720p	$0.05

Utility endpoints

Endpoint	Cost
`GET /balance`	Free
`POST /tasks/submit`	Cost = underlying model rate × duration
`GET /tasks/query`	Free
`POST /tasks/cancel`	Free if status=queued; partial-billed if running

Pricing notes

Failures don’t bill. Any non-2xx response is free. The narrow exception: video tasks cancelled after running — partial GPU seconds are charged.
Cache write rate: most models bill cache writes at the input rate. Claude models bill cache writes at 1.25× input.
Cache read rate: a separate, lower per-model rate — it varies by model, so see each row (it is not a flat 10% across the board).
Web search / grounding tools (where supported): per-1k-uses surcharge, billed separately from token usage.
Image / video task failures: 100% refund at the task level.
Chat 5xx: not billed; auto-retried at the gateway envelope.

Programmatic access

curl https://llm.bytespike.ai/api/pricing \
  -H "x-api-key: $BYTESPIKE_API_KEY"

Returns a JSON array with one entry per model:

{
  "model": "gpt-5-5",
  "category": "text",
  "vendor": "openai",
  "rates": {
    "input_per_1m": 5.00,
    "cache_write_per_1m": 5.00,
    "cache_read_per_1m": 0.50,
    "output_per_1m": 30.00,
    "currency": "USD"
  },
  "updated_at": "2026-05-08T04:30:00Z"
}

Refresh cadence is daily at 04:30 UTC. Cache the response client-side for at least 24 hours; the updated_at field tells you when the rate last refreshed.

​How rates work

​Text models (25)

​OpenAI (7)

​Anthropic (7)

​Google (5)

​National LLMs (6)

​DeepSeek (3)

​Moonshot (1)

​Zhipu (1)

​MiniMax (1)

​Image models (6)

​OpenAI (1)

​Google (2)

​ByteDance (3)

​Video models (9)

​OpenAI (2)

​Google (2)

​ByteDance (5)

​Utility endpoints

​Pricing notes

​Programmatic access

​See also

How rates work

Text models (25)

OpenAI (7)

Anthropic (7)

Google (5)

National LLMs (6)

DeepSeek (3)

Moonshot (1)

Zhipu (1)

MiniMax (1)

Image models (6)

OpenAI (1)

Google (2)

ByteDance (3)

Video models (9)

OpenAI (2)

Google (2)

ByteDance (5)

Utility endpoints

Pricing notes

Programmatic access

See also