gemini-3-flash is Google’s fast, low-cost Gemini — multimodal, with a 200K-token context window. It is the value tier of the Gemini 3 family: cheap enough for high-volume chat and OCR-style vision, with grounding for fresh facts.
Pricing: 3.00 / 1M output — see the rate card.
Protocols
| Protocol | Path |
|---|---|
| Gemini Native | POST https://llm.bytespike.ai/v1beta/models/gemini-3-flash:generateContent |
| OpenAI Chat Completions (shim) | POST https://llm.bytespike.ai/v1/chat/completions |
| Anthropic Messages (translated) | POST https://llm.bytespike.ai/v1/messages |
generateContent shape behind the scenes. From the client side, you write standard openai-SDK code.
Quickstart
Capabilities
| Capability | Supported |
|---|---|
| Chat Completions (shim) | ✅ |
| Streaming (SSE) | ✅ |
| Vision (image input) | ✅ |
| Tools / function calling | ✅ parallel |
| JSON mode | ✅ |
| Grounding (web search) | ✅ |
| Context window | 200K tokens |
When to use
- High-volume cheap chat — the lowest-cost Gemini 3 tier for general Q&A and drafting.
- High-volume vision — OCR-style tasks on receipts / invoices / screenshots.
- Fresh-fact tasks at low cost — grounding (Google Search) is supported on Flash.
- Harder reasoning —
gemini-3-5-flashadds headroom at modest extra cost. - 1M-context jobs —
gemini-3-5-flashandgemini-3-1-procarry the long-context window.
Next
- gemini-3-5-flash — mid-tier, 1M context
- gemini-3-1-pro — 1M-context flagship