deepseek-v4-pro
Capability: 64K context · tool use · streaming · structured output · reasoning
Pricing: per-token, pro tier (live rate)
DeepSeek V4 Pro is the model to beat in the open-weight tier on code
generation tasks. Reach for it when the problem is structured —
implementations of well-defined algorithms, refactors with clear
constraints, codebase-style migrations — and the answer either
compiles or it doesn’t. For freer-form tasks (architecture design,
prose, multi-step planning), GPT-5.5 and Claude Opus 4.8 still have an
edge.
Request
Body parameters
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
model | string | yes | — | deepseek-v4-pro |
messages | array | yes | — | — |
max_tokens | integer | no | model max | Max: 16384. |
temperature | number | no | 1.0 | — |
tools | array | no | — | Function calling supported (parallel). |
response_format | object | no | — | JSON / structured output. |
reasoning | object | no | — | Optional reasoning chain — set {"enabled": true} to enable. |
stream | boolean | no | false | SSE streaming. |
Response
Code examples
Streaming + caching
"stream": true for SSE. Automatic prompt caching on stable prefixes.
Errors
| Code | Trigger | Billed? |
|---|---|---|
| 400 / 401 / 402 / 422 / 429 | Standard | No |
| 5xx | Upstream | No (auto-retry) |
When to use
- Structured code generation in well-defined languages.
- Algorithm implementations, refactors with hard constraints.
- For lower latency, see DeepSeek V4 Flash.
- For multi-step planning beyond code, see GPT-5.5 or Claude Opus 4.7 (now superseded by Opus 4.8).
Limits
| Limit | Value |
|---|---|
| Context window | 64K tokens |
| Max output | 16384 tokens |
| Supports tool use | Yes (parallel) |
| Supports vision | No |
| Supports streaming | Yes |
| Supports prompt caching | Automatic |
| Supports reasoning chain | Yes |