Skip to main content
deepseek-v4-flash is the small-mid of the DeepSeek V4 family. At $0.14 / 1M input it undercuts Haiku and speaks both OpenAI Chat Completions and the Anthropic Messages shape — making it a popular cost-savings substitute for gpt-5-4-mini or claude-haiku-4-5 in volume-sensitive pipelines. Pricing: 0.14/1Minput,0.14 / 1M input, 0.28 / 1M output, $0.003 / 1M cache read — see the rate card.

Protocols

ProtocolPath
Anthropic MessagesPOST https://llm.bytespike.ai/v1/messages
OpenAI Chat CompletionsPOST https://llm.bytespike.ai/v1/chat/completions

Quickstart

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      { "role": "user", "content": "Classify: refund / billing / technical / other.\n\nMy invoice charged twice." }
    ]
  }'

Capabilities

CapabilitySupported
Chat Completions
Anthropic Messages
Streaming (SSE)
Tool use
Parallel tools
JSON mode
Reasoning chain— (use deepseek-v4-pro)
Vision (HTTP API)
Context window64K tokens

When to use

  • High-volume classification, routing, structured extraction — cheapest Chinese-LLM tier with both OpenAI + Anthropic protocol coverage.
  • Agent loops on a tight budgettool_use works via the Anthropic endpoint at Haiku-comparable pricing.
  • Cost-optimized fallback — when your task fits Flash, prefer this id directly over deepseek-v4-pro for ~3× cost savings.
When not to use:
  • Tasks that need reasoning chain — go to deepseek-v4-pro.
  • Vision input — not on HTTP API today.

Next