deepseek-v4-flash is the small-mid of the DeepSeek V4 family. At $0.14 / 1M input it undercuts Haiku and speaks both OpenAI Chat Completions and the Anthropic Messages shape — making it a popular cost-savings substitute for gpt-5-4-mini or claude-haiku-4-5 in volume-sensitive pipelines.
Pricing: 0.28 / 1M output, $0.003 / 1M cache read — see the rate card.
Protocols
| Protocol | Path |
|---|---|
| Anthropic Messages | POST https://llm.bytespike.ai/v1/messages |
| OpenAI Chat Completions | POST https://llm.bytespike.ai/v1/chat/completions |
Quickstart
Capabilities
| Capability | Supported |
|---|---|
| Chat Completions | ✅ |
| Anthropic Messages | ✅ |
| Streaming (SSE) | ✅ |
| Tool use | ✅ |
| Parallel tools | — |
| JSON mode | ✅ |
| Reasoning chain | — (use deepseek-v4-pro) |
| Vision (HTTP API) | ❌ |
| Context window | 64K tokens |
When to use
- High-volume classification, routing, structured extraction — cheapest Chinese-LLM tier with both OpenAI + Anthropic protocol coverage.
- Agent loops on a tight budget —
tool_useworks via the Anthropic endpoint at Haiku-comparable pricing. - Cost-optimized fallback — when your task fits Flash, prefer this id directly over
deepseek-v4-profor ~3× cost savings.
- Tasks that need reasoning chain — go to
deepseek-v4-pro. - Vision input — not on HTTP API today.
Next
- deepseek-v4-pro — reasoning flagship
- deepseek-v3.2 — older V3 series