gpt-5-4
Capability: 128K context · tool use · vision · streaming · structured output
Pricing: per-token, standard tier (live rate)
GPT-5.4 is the workhorse of the 5.4 wave — better tool-call argument
generation than 5.2, tighter structured output, same 128K context.
Production default for any team that needs more than mini quality but
doesn’t want the 5.5 latency premium. For multi-step reasoning, see
GPT-5.4-pro.
Request
Body parameters
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
model | string | yes | — | gpt-5-4 |
messages | array | yes | — | — |
max_tokens | integer | no | model max | Max: 16384. |
temperature | number | no | 1.0 | — |
tools | array | no | — | Parallel function calling. |
response_format | object | no | — | JSON / structured output. |
stream | boolean | no | false | SSE streaming. |
Response
Code examples
Streaming + caching
"stream": true for SSE. Automatic prompt caching for repeated
prefixes — biggest cost win on long system prompts.
Errors
| Code | Trigger | Billed? |
|---|---|---|
| 400 / 401 / 402 / 422 / 429 | Standard | No |
| 5xx | Upstream | No (auto-retry) |
When to use
- Default production model for code generation, content rewriting, and tool-using agents.
- For multi-step reasoning where each step compounds, see GPT-5.4-pro.
- For the latest flagship, see GPT-5.5.
- For lower cost, see GPT-5.4-mini.
- For a faster but slightly older mid-tier, see GPT-5.2.
Limits
| Limit | Value |
|---|---|
| Context window | 128K tokens |
| Max output | 16384 tokens |
| Supports tool use | Yes (parallel) |
| Supports vision | Yes |
| Supports streaming | Yes |
| Supports prompt caching | Automatic |