Skip to main content
Vendor: Google Model ID: gemini-3-1-pro Capability: 1M context · vision · audio in · tool use · streaming · structured output · grounding Pricing: per-token, pro tier (live rate) Gemini 3.1 Pro is the right call when both input length and modality mix matter. It accepts up to 1M tokens of input — enough for a long PDF, a video transcript, or a mixed corpus of text and images — and reasons across the whole thing in a single call. For text-only flagship work, GPT-5.5 and Claude Opus 4.8 are competitive; Gemini 3.1 Pro’s edge is the multimodal long-context combination.

Request

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gemini-3-1-pro",
    "messages": [{"role": "user", "content": "Summarize the key claims in this 200-page filing."}]
  }'

Body parameters

FieldTypeRequiredDefaultNotes
modelstringyesgemini-3-1-pro
messagesarrayyesOpenAI chat shape; supports image_url and input_audio blocks.
max_tokensintegernomodel maxMax: 32768.
temperaturenumberno1.0
toolsarraynoFunction calling supported.
response_formatobjectnoJSON / structured output.
groundingobjectnoGoogle Search grounding tool — billed per use.
streambooleannofalseSSE streaming.

Response

{
  "id": "chatcmpl-…",
  "object": "chat.completion",
  "model": "gemini-3-1-pro",
  "choices": [{"index": 0, "message": {"role": "assistant", "content": "..."}, "finish_reason": "stop"}],
  "usage": {"prompt_tokens": 287430, "completion_tokens": 4218, "total_tokens": 291648}
}

Code examples

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{"model": "gemini-3-1-pro", "messages": [{"role": "user", "content": "Summarize this filing."}]}'
Pass "grounding": {} to give the model a built-in Google Search tool for fact-grounded tasks. Billed per use; see pricing. Useful when the question requires current information the training cutoff might miss.

Streaming + caching

"stream": true for SSE. Automatic prompt caching — for 1M-token prompts, cache hits are the highest-leverage cost optimisation.

Errors

CodeTriggerBilled?
400 / 401 / 402 / 422 / 429StandardNo
413Input exceeds 1M tokensNo
5xxUpstreamNo (auto-retry)

When to use

  • Multimodal long-context work (text + images + transcripts together).
  • Long-document reasoning where 200K isn’t enough.
  • For text-only flagship work, compare against GPT-5.5 and Claude Opus 4.7 (now superseded by Opus 4.8).
  • For flash-tier classification, see Gemini 3 Flash.

Limits

LimitValue
Context window1M tokens
Max output32768 tokens
Supports tool useYes
Supports visionYes
Supports audio inputYes
Supports streamingYes
Supports prompt cachingAutomatic
Supports grounding (Google Search)Yes