Skip to main content
Vendor: DeepSeek Model ID: deepseek-v4-flash Capability: 64K context · tool use · streaming · structured output Pricing: per-token, flash tier (live rate) DeepSeek V4 Flash takes the V4 base and tunes for latency. Same strong code generation on bounded prompts, half the wait of V4 Pro on short inputs. Right pick for inline code suggestions, lint-style fixes, and any agent step where one or two seconds matters.

Request

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [{"role": "user", "content": "Add type hints to this Python function."}]
  }'

Body parameters

FieldTypeRequiredDefaultNotes
modelstringyesdeepseek-v4-flash
messagesarrayyes
max_tokensintegernomodel maxMax: 8192.
toolsarraynoFunction calling supported.
response_formatobjectnoJSON / structured output.
streambooleannofalseSSE streaming.

Response

{
  "id": "chatcmpl-…",
  "object": "chat.completion",
  "model": "deepseek-v4-flash",
  "choices": [{"index": 0, "message": {"role": "assistant", "content": "..."}, "finish_reason": "stop"}],
  "usage": {"prompt_tokens": 64, "completion_tokens": 142, "total_tokens": 206}
}

Code examples

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{"model": "deepseek-v4-flash", "messages": [{"role": "user", "content": "Add type hints."}]}'

Streaming + caching

"stream": true for SSE. Automatic prompt caching.

Errors

CodeTriggerBilled?
400 / 401 / 402 / 422 / 429StandardNo
5xxUpstreamNo (auto-retry)

When to use

  • Inline code suggestions, lint-style fixes, IDE-integrated agents.
  • Latency-bound code routing.
  • For full V4 Pro quality on hard problems, see DeepSeek V4 Pro.
  • For prior generation, see DeepSeek V3.2.

Limits

LimitValue
Context window64K tokens
Max output8192 tokens
Supports tool useYes
Supports visionNo
Supports streamingYes
Supports prompt cachingAutomatic