deepseek-v4-flash - ByteSpike

deepseek-v4-flash is the small-mid of the DeepSeek V4 family. At $0.14 / 1M input it undercuts Haiku and speaks both OpenAI Chat Completions and the Anthropic Messages shape — making it a popular cost-savings substitute for gpt-5-4-mini or claude-haiku-4-5 in volume-sensitive pipelines. Pricing:

0.14 / 1M input,

0.28 / 1M output, $0.003 / 1M cache read — see the rate card.

Protocols

Protocol	Path
Anthropic Messages	`POST https://llm.bytespike.ai/v1/messages`
OpenAI Chat Completions	`POST https://llm.bytespike.ai/v1/chat/completions`

Quickstart

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      { "role": "user", "content": "Classify: refund / billing / technical / other.\n\nMy invoice charged twice." }
    ]
  }'

Capabilities

Capability	Supported
Chat Completions	✅
Anthropic Messages	✅
Streaming (SSE)	✅
Tool use	✅
Parallel tools	—
JSON mode	✅
Reasoning chain	— (use `deepseek-v4-pro`)
Vision (HTTP API)	❌
Context window	64K tokens

When to use

High-volume classification, routing, structured extraction — cheapest Chinese-LLM tier with both OpenAI + Anthropic protocol coverage.
Agent loops on a tight budget — tool_use works via the Anthropic endpoint at Haiku-comparable pricing.
Cost-optimized fallback — when your task fits Flash, prefer this id directly over deepseek-v4-pro for ~3× cost savings.

When not to use:

Tasks that need reasoning chain — go to deepseek-v4-pro.
Vision input — not on HTTP API today.

deepseek-v4-pro — reasoning flagship
deepseek-v3.2 — older V3 series

deepseek-v3-anthropic deepseek-v4-pro

​Protocols

​Quickstart

​Capabilities

​When to use

​Next

Protocols

Quickstart

Capabilities

When to use

Next