gemini-3-1-pro
Capability: 1M context · vision · audio in · tool use · streaming · structured output · grounding
Pricing: per-token, pro tier (live rate)
Gemini 3.1 Pro is the right call when both input length and
modality mix matter. It accepts up to 1M tokens of input — enough
for a long PDF, a video transcript, or a mixed corpus of text and
images — and reasons across the whole thing in a single call. For
text-only flagship work, GPT-5.5 and Claude Opus 4.8 are competitive;
Gemini 3.1 Pro’s edge is the multimodal long-context combination.
Request
Body parameters
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
model | string | yes | — | gemini-3-1-pro |
messages | array | yes | — | OpenAI chat shape; supports image_url and input_audio blocks. |
max_tokens | integer | no | model max | Max: 32768. |
temperature | number | no | 1.0 | — |
tools | array | no | — | Function calling supported. |
response_format | object | no | — | JSON / structured output. |
grounding | object | no | — | Google Search grounding tool — billed per use. |
stream | boolean | no | false | SSE streaming. |
Response
Code examples
Grounding (Google Search)
Pass"grounding": {} to give the model a built-in Google Search tool
for fact-grounded tasks. Billed per use; see
pricing. Useful when the question
requires current information the training cutoff might miss.
Streaming + caching
"stream": true for SSE. Automatic prompt caching — for 1M-token
prompts, cache hits are the highest-leverage cost optimisation.
Errors
| Code | Trigger | Billed? |
|---|---|---|
| 400 / 401 / 402 / 422 / 429 | Standard | No |
| 413 | Input exceeds 1M tokens | No |
| 5xx | Upstream | No (auto-retry) |
When to use
- Multimodal long-context work (text + images + transcripts together).
- Long-document reasoning where 200K isn’t enough.
- For text-only flagship work, compare against GPT-5.5 and Claude Opus 4.7 (now superseded by Opus 4.8).
- For flash-tier classification, see Gemini 3 Flash.
Limits
| Limit | Value |
|---|---|
| Context window | 1M tokens |
| Max output | 32768 tokens |
| Supports tool use | Yes |
| Supports vision | Yes |
| Supports audio input | Yes |
| Supports streaming | Yes |
| Supports prompt caching | Automatic |
| Supports grounding (Google Search) | Yes |