gemini-2-5-flash - ByteSpike

gemini-2-5-flash is Google’s small-mid Gemini — fast, multimodal, with the 1M-token context window that distinguishes the entire Gemini family. Competitive with gpt-5-4-mini and claude-haiku-4-5 on most benchmarks, plus the long-context advantage. Pricing:

0.50 / 1M input,

3.00 / 1M output, $0.05 / 1M cache read — see the rate card.

Protocols

Protocol	Path
Gemini Native	`POST https://llm.bytespike.ai/v1beta/models/gemini-2-5-flash:generateContent`
OpenAI Chat Completions (shim)	`POST https://llm.bytespike.ai/v1/chat/completions`
Anthropic Messages (translated)	`POST https://llm.bytespike.ai/v1/messages`

For the OpenAI shim, the gateway translates the request body to Gemini’s generateContent shape behind the scenes. From the client side, you write standard openai-SDK code.

Quickstart

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gemini-2-5-flash",
    "messages": [{ "role": "user", "content": "Hello, ByteSpike." }]
  }'

Capabilities

Capability	Supported
Chat Completions (shim)	✅
Streaming (SSE)	✅
Vision (image input)	✅
Tools / function calling	✅ parallel
JSON mode	✅
Grounding (web search)	✅
Long context	✅ 1M tokens
Context window	1M tokens

When to use

Long-context cheap — 1M context at Haiku price is unique. Codebase reviews, multi-doc QA where 200K isn’t enough.
Fresh-fact tasks at low cost — grounding (Google Search) is supported on Flash.
High-volume vision — OCR-style tasks on receipts / invoices / screenshots.

When not to use:

Hard reasoning — gemini-3-1-pro is sized for it; Flash is more chat-tier.
Anthropic Messages protocol — Gemini only speaks the OpenAI shim today.

gemini-3-1-pro — current Gemini flagship
gemini-3-5-flash — newer fast Gemini tier

gpt-5-5 gemini-3-flash

​Protocols

​Quickstart

​Capabilities

​When to use

​Next

Protocols

Quickstart

Capabilities

When to use

Next