gpt-5-mini
Capability: 128K context · tool use · vision · streaming · structured output
Pricing: per-token, mini tier (live rate)
GPT-5-mini is the small model that displaced GPT-4o-mini for new
production work. Same price tier, measurably tighter structured output,
better tool-call argument generation. For most extraction and routing
flows, this is the starting point — only step up to a standard 5-series
model when you’ve benchmark-confirmed that mini’s quality plateau is
the bottleneck.
Request
Body parameters
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
model | string | yes | — | gpt-5-mini |
messages | array | yes | — | OpenAI chat shape. |
max_tokens | integer | no | model max | Max: 16384. |
temperature | number | no | 1.0 | Range 0.0–2.0. |
tools | array | no | — | Function calling supported (parallel). |
tool_choice | string | object | no | "auto" | — |
response_format | object | no | — | JSON mode + structured output (recommended for extraction). |
stream | boolean | no | false | SSE streaming. |
Response
Code examples
Streaming + caching
"stream": true for SSE. Automatic prompt caching — keep system prompt
and tool schema stable for max cache hit rate.
Errors
| Code | Trigger | Billed? |
|---|---|---|
| 400 | Body validation | No |
| 401 | Missing / revoked key | No |
| 402 | Wallet exhausted | No |
| 422 | Param not supported | No |
| 429 | Rate-limited | No |
| 5xx | Upstream provider issue | No |
When to use
- Production extraction / structured output / routing.
- Lightweight agent steps (one tool call per step).
- For higher quality, see GPT-5.5 or GPT-5.4.
- For lower latency, see GPT-5-nano.
- For 5.4-era mini, see GPT-5.4-mini.
Limits
| Limit | Value |
|---|---|
| Context window | 128K tokens |
| Max output | 16384 tokens |
| Supports tool use | Yes (parallel) |
| Supports vision | Yes |
| Supports streaming | Yes |
| Supports prompt caching | Automatic |
| Supports structured output | Yes |