deepseek-v4-flash
Capability: 64K context · tool use · streaming · structured output
Pricing: per-token, flash tier (live rate)
DeepSeek V4 Flash takes the V4 base and tunes for latency. Same
strong code generation on bounded prompts, half the wait of V4 Pro
on short inputs. Right pick for inline code suggestions, lint-style
fixes, and any agent step where one or two seconds matters.
Request
Body parameters
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
model | string | yes | — | deepseek-v4-flash |
messages | array | yes | — | — |
max_tokens | integer | no | model max | Max: 8192. |
tools | array | no | — | Function calling supported. |
response_format | object | no | — | JSON / structured output. |
stream | boolean | no | false | SSE streaming. |
Response
Code examples
Streaming + caching
"stream": true for SSE. Automatic prompt caching.
Errors
| Code | Trigger | Billed? |
|---|---|---|
| 400 / 401 / 402 / 422 / 429 | Standard | No |
| 5xx | Upstream | No (auto-retry) |
When to use
- Inline code suggestions, lint-style fixes, IDE-integrated agents.
- Latency-bound code routing.
- For full V4 Pro quality on hard problems, see DeepSeek V4 Pro.
- For prior generation, see DeepSeek V3.2.
Limits
| Limit | Value |
|---|---|
| Context window | 64K tokens |
| Max output | 8192 tokens |
| Supports tool use | Yes |
| Supports vision | No |
| Supports streaming | Yes |
| Supports prompt caching | Automatic |