Skip to main content
Vendor: Google Model ID: gemini-3-flash Capability: 200K context · vision · tool use · streaming · structured output Pricing: per-token, flash tier (live rate) Gemini 3 Flash is the small / fast member of the Gemini 3 family. It’s the model to reach for when you have lots of mostly-text input with occasional images and you’d rather make many cheap calls than a few expensive ones. The 200K context window also makes it useful for long-document classification where you don’t need flagship reasoning.

Request

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gemini-3-flash",
    "messages": [{"role": "user", "content": "Classify the topic of this article."}]
  }'

Body parameters

FieldTypeRequiredDefaultNotes
modelstringyesgemini-3-flash
messagesarrayyesOpenAI chat shape with image_url blocks for vision.
max_tokensintegernomodel maxMax: 8192.
temperaturenumberno1.0Range 0.0–2.0.
toolsarraynoFunction calling supported.
response_formatobjectnoJSON mode + structured output.
streambooleannofalseSSE streaming.

Response

{
  "id": "chatcmpl-…",
  "object": "chat.completion",
  "model": "gemini-3-flash",
  "choices": [{"index": 0, "message": {"role": "assistant", "content": "technology / startups"}, "finish_reason": "stop"}],
  "usage": {"prompt_tokens": 412, "completion_tokens": 4, "total_tokens": 416}
}

Code examples

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{"model": "gemini-3-flash", "messages": [{"role": "user", "content": "Classify the topic."}]}'

Streaming + caching

"stream": true for SSE. Automatic prompt caching on stable prefixes.

Errors

CodeTriggerBilled?
400 / 401 / 402 / 422 / 429StandardNo
5xxUpstreamNo (auto-retry)

When to use

  • High-volume classification / routing on long-document inputs (200K context).
  • Vision-native tasks at flash-tier price.
  • For deeper reasoning, see Gemini 3.1 Pro.
  • For OpenAI flash equivalent, see GPT-5.4 mini.

Limits

LimitValue
Context window200K tokens
Max output8192 tokens
Supports tool useYes
Supports visionYes
Supports streamingYes
Supports prompt cachingAutomatic