gpt-5-5
Capability: 128K context · tool use · vision · streaming · structured output · reasoning_effort
Pricing: per-token, flagship tier (live rate)
GPT-5.5 is the current OpenAI flagship — the default for any new
project on the platform. It’s the model that put native reasoning
into the standard chat completions shape: same request body, same
response shape, with reasoning_effort as an optional dial. For most
production work the default "medium" setting is right; lift to
"high" only on hard problems where Sonnet or 5.4-pro have left
quality on the table.
Request
Body parameters
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
model | string | yes | — | gpt-5-5 |
messages | array | yes | — | — |
reasoning_effort | string | no | "medium" | "low" / "medium" / "high". |
max_tokens | integer | no | model max | Max: 32768. |
tools | array | no | — | Parallel function calling. |
response_format | object | no | — | JSON mode + structured output (recommended for production). |
web_search | object | no | — | Built-in web search tool — billed per use. |
stream | boolean | no | false | SSE streaming. |
Response
reasoning_tokens billed at input-token rate.
Code examples
Reasoning effort
| Setting | Use for |
|---|---|
"low" | Fast routing / classification with reasoning kept light |
"medium" | Default — production code gen, content rewriting, agents |
"high" | Hard problems where you can wait — proofs, deep refactors, plans |
Web search
Pass"web_search": {} to give the model a built-in web search tool.
The tool is billed per use (see pricing
for current rate). Useful for fact-grounded tasks where the model would
otherwise hallucinate or cite stale information.
Streaming + caching
"stream": true for SSE. With reasoning enabled, expect a longer TTFB.
Automatic prompt caching on stable prefixes — the highest-leverage cost
optimisation on this tier.
Errors
| Code | Trigger | Billed? |
|---|---|---|
| 400 / 401 / 402 / 422 / 429 | Standard | No |
| 5xx | Upstream | No (auto-retry) |
When to use
- Default starting point for any new project on OpenAI.
- Code generation in an existing codebase, schema / API design.
- Multi-step plans, structured output where mid-tier models drift.
- For latency-critical responses, see GPT-5.5-instant.
- For mid-tier cost / latency, see GPT-5.4.
Limits
| Limit | Value |
|---|---|
| Context window | 128K tokens |
| Max output | 32768 tokens |
| Supports tool use | Yes (parallel) |
| Supports vision | Yes |
| Supports streaming | Yes |
| Supports prompt caching | Automatic |
| Supports reasoning_effort | Yes |
| Supports web search tool | Yes |