gemini-2-5-flash is Google’s small-mid Gemini — fast, multimodal, with the 1M-token context window that distinguishes the entire Gemini family. Competitive with gpt-5-4-mini and claude-haiku-4-5 on most benchmarks, plus the long-context advantage.
Pricing: 3.00 / 1M output, $0.05 / 1M cache read — see the rate card.
Protocols
| Protocol | Path |
|---|---|
| Gemini Native | POST https://llm.bytespike.ai/v1beta/models/gemini-2-5-flash:generateContent |
| OpenAI Chat Completions (shim) | POST https://llm.bytespike.ai/v1/chat/completions |
| Anthropic Messages (translated) | POST https://llm.bytespike.ai/v1/messages |
generateContent shape behind the scenes. From the client side, you write standard openai-SDK code.
Quickstart
Capabilities
| Capability | Supported |
|---|---|
| Chat Completions (shim) | ✅ |
| Streaming (SSE) | ✅ |
| Vision (image input) | ✅ |
| Tools / function calling | ✅ parallel |
| JSON mode | ✅ |
| Grounding (web search) | ✅ |
| Long context | ✅ 1M tokens |
| Context window | 1M tokens |
When to use
- Long-context cheap — 1M context at Haiku price is unique. Codebase reviews, multi-doc QA where 200K isn’t enough.
- Fresh-fact tasks at low cost — grounding (Google Search) is supported on Flash.
- High-volume vision — OCR-style tasks on receipts / invoices / screenshots.
- Hard reasoning —
gemini-3-1-prois sized for it; Flash is more chat-tier. - Anthropic Messages protocol — Gemini only speaks the OpenAI shim today.
Next
- gemini-3-1-pro — current Gemini flagship
- gemini-3-5-flash — newer fast Gemini tier