Gemini 3 Flash - ByteSpike

Vendor: Google Model ID: gemini-3-flash Capability: 200K context · vision · tool use · streaming · structured output Pricing: per-token, flash tier (live rate) Gemini 3 Flash is the small / fast member of the Gemini 3 family. It’s the model to reach for when you have lots of mostly-text input with occasional images and you’d rather make many cheap calls than a few expensive ones. The 200K context window also makes it useful for long-document classification where you don’t need flagship reasoning.

Request

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gemini-3-flash",
    "messages": [{"role": "user", "content": "Classify the topic of this article."}]
  }'

Body parameters

Field	Type	Required	Default	Notes
`model`	string	yes	—	`gemini-3-flash`
`messages`	array	yes	—	OpenAI chat shape with `image_url` blocks for vision.
`max_tokens`	integer	no	model max	Max: 8192.
`temperature`	number	no	1.0	Range 0.0–2.0.
`tools`	array	no	—	Function calling supported.
`response_format`	object	no	—	JSON mode + structured output.
`stream`	boolean	no	false	SSE streaming.

Response

{
  "id": "chatcmpl-…",
  "object": "chat.completion",
  "model": "gemini-3-flash",
  "choices": [{"index": 0, "message": {"role": "assistant", "content": "technology / startups"}, "finish_reason": "stop"}],
  "usage": {"prompt_tokens": 412, "completion_tokens": 4, "total_tokens": 416}
}

Code examples

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{"model": "gemini-3-flash", "messages": [{"role": "user", "content": "Classify the topic."}]}'

Streaming + caching

"stream": true for SSE. Automatic prompt caching on stable prefixes.

Errors

Code	Trigger	Billed?
400 / 401 / 402 / 422 / 429	Standard	No
5xx	Upstream	No (auto-retry)

When to use

High-volume classification / routing on long-document inputs (200K context).
Vision-native tasks at flash-tier price.
For deeper reasoning, see Gemini 3.1 Pro.
For OpenAI flash equivalent, see GPT-5.4 mini.

Limits

Limit	Value
Context window	200K tokens
Max output	8192 tokens
Supports tool use	Yes
Supports vision	Yes
Supports streaming	Yes
Supports prompt caching	Automatic

Claude Opus 4.8 Gemini 3.1 Pro

​Request

​Body parameters

​Response

​Code examples

​Streaming + caching

​Errors

​When to use

​Limits