gpt-5-nano
Capability: 128K context · tool use · vision · streaming
Pricing: per-token, nano tier (live rate)
GPT-5-nano is the latency floor of the 5-series. Reach for it when the
plan is to make many cheap LLM calls — sub-LLM judges, classifier
chains, agent loops where each step is a routing decision rather than
an answer. The quality gap to mini is small for bounded prompts and
the latency win is significant.
Request
Body parameters
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
model | string | yes | — | gpt-5-nano |
messages | array | yes | — | OpenAI chat shape. |
max_tokens | integer | no | model max | Max: 8192. |
temperature | number | no | 1.0 | Range 0.0–2.0. |
tools | array | no | — | Function calling supported. |
response_format | object | no | — | JSON mode + structured output supported. |
stream | boolean | no | false | SSE streaming. |
Response
Code examples
Streaming + caching
"stream": true for SSE. Automatic prompt caching on stable prefixes.
On nano, cache hits matter less than on larger models because the
input rate is already low — but they still help on long system prompts.
Errors
| Code | Trigger | Billed? |
|---|---|---|
| 400 | Body validation | No |
| 401 | Missing / revoked key | No |
| 402 | Wallet exhausted | No |
| 422 | Param not supported | No |
| 429 | Rate-limited | No |
| 5xx | Upstream provider issue | No |
When to use
- Routing / triage / classification at the head of an agent pipeline.
- High-volume sub-LLM judges.
- For more capability at slightly higher latency, see GPT-5-mini.
- For 5.4-tier routing, see GPT-5.4-nano.
Limits
| Limit | Value |
|---|---|
| Context window | 128K tokens |
| Max output | 8192 tokens |
| Supports tool use | Yes |
| Supports vision | Yes |
| Supports streaming | Yes |
| Supports prompt caching | Automatic |