Skip to main content
gemini-3-5-flash is Google’s mid-tier Gemini — fast and multimodal, with the 1M-token context window that distinguishes the Gemini family. It sits between gemini-3-flash and gemini-3-1-pro: more reasoning headroom than Flash, at a fraction of Pro’s cost. Pricing: 1.50/1Minput,1.50 / 1M input, 9.00 / 1M output — see the rate card.

Protocols

ProtocolPath
Gemini NativePOST https://llm.bytespike.ai/v1beta/models/gemini-3-5-flash:generateContent
OpenAI Chat Completions (shim)POST https://llm.bytespike.ai/v1/chat/completions
Anthropic Messages (translated)POST https://llm.bytespike.ai/v1/messages
For the OpenAI shim, the gateway translates the request body to Gemini’s generateContent shape behind the scenes. From the client side, you write standard openai-SDK code.

Quickstart

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gemini-3-5-flash",
    "messages": [{ "role": "user", "content": "Hello, ByteSpike." }]
  }'

Capabilities

CapabilitySupported
Chat Completions (shim)
Streaming (SSE)
Vision (image input)
Tools / function calling✅ parallel
JSON mode
Grounding (web search)
Long context✅ 1M tokens
Context window1M tokens

When to use

  • Mid-tier reasoning at low cost — harder tasks than Flash can handle, without stepping up to Pro pricing.
  • Long-context work — 1M context for codebase reviews and multi-doc QA where 200K isn’t enough.
  • Fresh-fact tasks — grounding (Google Search) is supported.
When not to use:
  • Cheapest possible chat — gemini-3-flash is lower cost for simpler tasks.
  • Hardest reasoning — gemini-3-1-pro is the flagship sized for it.

Next