gemini-3-flash
Capability: 200K context · vision · tool use · streaming · structured output
Pricing: per-token, flash tier (live rate)
Gemini 3 Flash is the small / fast member of the Gemini 3 family.
It’s the model to reach for when you have lots of mostly-text input
with occasional images and you’d rather make many cheap calls than a
few expensive ones. The 200K context window also makes it useful for
long-document classification where you don’t need flagship reasoning.
Request
Body parameters
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
model | string | yes | — | gemini-3-flash |
messages | array | yes | — | OpenAI chat shape with image_url blocks for vision. |
max_tokens | integer | no | model max | Max: 8192. |
temperature | number | no | 1.0 | Range 0.0–2.0. |
tools | array | no | — | Function calling supported. |
response_format | object | no | — | JSON mode + structured output. |
stream | boolean | no | false | SSE streaming. |
Response
Code examples
Streaming + caching
"stream": true for SSE. Automatic prompt caching on stable prefixes.
Errors
| Code | Trigger | Billed? |
|---|---|---|
| 400 / 401 / 402 / 422 / 429 | Standard | No |
| 5xx | Upstream | No (auto-retry) |
When to use
- High-volume classification / routing on long-document inputs (200K context).
- Vision-native tasks at flash-tier price.
- For deeper reasoning, see Gemini 3.1 Pro.
- For OpenAI flash equivalent, see GPT-5.4 mini.
Limits
| Limit | Value |
|---|---|
| Context window | 200K tokens |
| Max output | 8192 tokens |
| Supports tool use | Yes |
| Supports vision | Yes |
| Supports streaming | Yes |
| Supports prompt caching | Automatic |