gpt-5-4-pro
Capability: 128K context · tool use · vision · streaming · structured output · reasoning_effort
Pricing: per-token, pro tier (live rate)
GPT-5.4-pro takes the standard 5.4 base and exposes a reasoning_effort
dial for OpenAI’s reasoning chain. Reach for it when the problem is
multi-step — code generation in an existing codebase with conventions,
math / proof tasks, planning across many sub-goals — and the first
draft has to compose, not just answer. The latency cost rises with
reasoning effort but the answer quality on hard problems jumps further
than the latency suggests.
Request
Body parameters
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
model | string | yes | — | gpt-5-4-pro |
messages | array | yes | — | — |
reasoning_effort | string | no | "medium" | "low" / "medium" / "high" — higher = longer reasoning chain, higher latency, higher quality on hard problems. |
max_tokens | integer | no | model max | Max: 32768. |
tools | array | no | — | Parallel function calling. |
response_format | object | no | — | JSON / structured output. |
stream | boolean | no | false | SSE streaming. |
Response
reasoning_tokens are billed at the input-token rate, not output.
Code examples
Reasoning effort guide
| Setting | Use for |
|---|---|
"low" | Fast structured output, light reasoning |
"medium" | Default — most multi-step tasks |
"high" | Hard math / proof / multi-goal planning where you can wait |
"high" for most tasks.
Streaming + caching
"stream": true for SSE. With reasoning enabled, the first response
chunk lands after the reasoning chain completes — there’s a longer
HTTP TTFB than non-reasoning models. Automatic prompt caching applies.
Errors
| Code | Trigger | Billed? |
|---|---|---|
| 400 / 401 / 402 / 422 / 429 | Standard | No |
| 5xx | Upstream | No (auto-retry) |
When to use
- Multi-step coding in an existing codebase.
- Hard math / proof / planning tasks.
- For non-reasoning 5.4 at lower latency, see GPT-5.4.
- For the latest reasoning-capable flagship, see GPT-5.5.
Limits
| Limit | Value |
|---|---|
| Context window | 128K tokens |
| Max output | 32768 tokens |
| Supports tool use | Yes (parallel) |
| Supports vision | Yes |
| Supports streaming | Yes |
| Supports prompt caching | Automatic |
| Supports reasoning_effort | Yes |