gpt-4o-image
Capability: 1024² – 2048² · multi-turn image generation · in-conversation editing
Pricing: per image, conversational tier (live rate)
GPT-4o Image is the conversational image generator — instead of a
one-shot /images/generations call, you send a chat completions
request and the model returns image content inside the response. This
matters when the workflow is multi-turn: “generate this”, “now make
the background blue”, “now add a dog”. The conversation memory
preserves the underlying image so subsequent turns are edits, not
fresh generations.
Request
Body parameters
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
model | string | yes | — | gpt-4o-image |
messages | array | yes | — | Standard chat shape. The model returns images as image_url content blocks. |
image_output.size | string | no | 1024x1024 | Supported: 1024x1024, 1024x1536, 1536x1024, 2048x2048. |
image_output.quality | string | no | "medium" | "low" / "medium" / "high". |
image_output.n | integer | no | 1 | 1–2 images per turn. |
tools | array | no | — | Function calling supported alongside image output. |
stream | boolean | no | false | Streaming partial-image deltas supported. |
Response
messages
to edit the same image on subsequent turns.
Code examples
Multi-turn edit workflow
Pass the assistant’s response (image url and all) back in the nextmessages array. The model treats the image in conversation context
as the canvas to edit:
Errors
| Code | Trigger | Billed? |
|---|---|---|
| 400 / 401 / 402 / 403 | Standard | No |
| 451 | Prompt blocked by upstream safety | No |
| 5xx | Upstream issue | No (auto-retry) |
When to use
- Multi-turn image editing where conversation context matters.
- Workflows that mix text reasoning with image output (the model can describe what it generated, ask clarifying questions).
- For one-shot / batch image generation, see GPT-Image 2.
- For pure photorealism, see Nano Banana Pro or Nano Banana 2.
Limits
| Limit | Value |
|---|---|
| Max output resolution | 2048×2048 |
Max images per turn (n) | 2 |
| Multi-turn editing | Yes |
Supports quality modifier | Yes |
| Sync? | Yes (≤30s typical) |
| Avg latency for 1024² | 10-16s |