gemini-3-5-flash is Google’s mid-tier Gemini — fast and multimodal, with the 1M-token context window that distinguishes the Gemini family. It sits between gemini-3-flash and gemini-3-1-pro: more reasoning headroom than Flash, at a fraction of Pro’s cost.
Pricing: 9.00 / 1M output — see the rate card.
Protocols
| Protocol | Path |
|---|---|
| Gemini Native | POST https://llm.bytespike.ai/v1beta/models/gemini-3-5-flash:generateContent |
| OpenAI Chat Completions (shim) | POST https://llm.bytespike.ai/v1/chat/completions |
| Anthropic Messages (translated) | POST https://llm.bytespike.ai/v1/messages |
generateContent shape behind the scenes. From the client side, you write standard openai-SDK code.
Quickstart
Capabilities
| Capability | Supported |
|---|---|
| Chat Completions (shim) | ✅ |
| Streaming (SSE) | ✅ |
| Vision (image input) | ✅ |
| Tools / function calling | ✅ parallel |
| JSON mode | ✅ |
| Grounding (web search) | ✅ |
| Long context | ✅ 1M tokens |
| Context window | 1M tokens |
When to use
- Mid-tier reasoning at low cost — harder tasks than Flash can handle, without stepping up to Pro pricing.
- Long-context work — 1M context for codebase reviews and multi-doc QA where 200K isn’t enough.
- Fresh-fact tasks — grounding (Google Search) is supported.
- Cheapest possible chat —
gemini-3-flashis lower cost for simpler tasks. - Hardest reasoning —
gemini-3-1-prois the flagship sized for it.
Next
- gemini-3-1-pro — 1M-context flagship
- gemini-3-flash — fast, lowest-cost tier