gpt-5-5-instant
Capability: 128K context · tool use · vision · streaming · structured output
Pricing: per-token, flagship tier (live rate)
GPT-5.5-instant is GPT-5.5 with the reasoning chain elided — the model
goes straight from prompt to answer. For shorter prompts and tasks
where the reasoning was overkill (extraction, structured output,
short-form rewriting), this is the right pick: same flagship quality
on bounded inputs, sub-second TTFB.
Request
Body parameters
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
model | string | yes | — | gpt-5-5-instant |
messages | array | yes | — | — |
max_tokens | integer | no | model max | Max: 16384. |
tools | array | no | — | Parallel function calling. |
response_format | object | no | — | JSON / structured output. |
stream | boolean | no | false | SSE streaming. |
reasoning_effort is not accepted on this variant — passing it
returns 422.
Response
reasoning_tokens field — there’s no reasoning chain.
Code examples
Streaming + caching
"stream": true for SSE. Sub-second TTFB on most prompts because the
reasoning chain is elided. Automatic prompt caching on stable prefixes.
Errors
| Code | Trigger | Billed? |
|---|---|---|
| 400 / 401 / 402 / 429 | Standard | No |
| 422 | Including reasoning_effort (use GPT-5.5 for reasoning) | No |
| 5xx | Upstream | No (auto-retry) |
When to use
- Latency-critical surfaces where reasoning is overkill.
- Short-form rewriting, extraction, structured output on bounded inputs.
- For deeper multi-step reasoning, use GPT-5.5 with
reasoning_effort. - For mini-tier cost on latency-bound work, see GPT-5.4-mini.
Limits
| Limit | Value |
|---|---|
| Context window | 128K tokens |
| Max output | 16384 tokens |
| Supports tool use | Yes (parallel) |
| Supports vision | Yes |
| Supports streaming | Yes |
| Supports prompt caching | Automatic |
| Supports reasoning_effort | No |