claude-sonnet-4-6 is the production workhorse of the Claude family — the model most ByteSpike customers default to when they need a model that does everything at a mid-tier price. Vision, tools, prompt caching, extended thinking, and web search are all available at the same 200K context as Opus; Opus reserves the top of the quality envelope for hard reasoning.
Pricing: 15.00 / 1M output, $0.30 / 1M cache read — see the rate card.
Protocols
| Protocol | Path |
|---|---|
| Anthropic Messages | POST https://llm.bytespike.ai/v1/messages |
| OpenAI Chat Completions | POST https://llm.bytespike.ai/v1/chat/completions |
Quickstart
Capabilities
| Capability | Supported |
|---|---|
| Chat completions | ✅ |
| Streaming (SSE) | ✅ |
| Vision (image input) | ✅ |
| Tool use (function calling) | ✅ parallel |
| Prompt caching (cache_control) | ✅ |
| Extended thinking | ✅ |
| Web search (web_search tool) | ✅ |
| JSON / structured output | ✅ |
| Context window | 200K tokens |
When to use
- Default mid-tier. When you need a model that does everything — vision + tools + thinking + web search — at Sonnet pricing.
- Agents. Use the Anthropic Messages endpoint; the
tool_useandthinkingblocks pass through transparently. - RAG.
cache_controlon a long system prompt amortizes the cost across many user turns. The first request pays the cache-write tier; subsequent reads from the same prefix pay $0.30/1M (10% of input). - Fresh-fact tasks. Add
web_searchto the tools array.
- Hardest reasoning — go to
claude-opus-4-8, the Claude flagship. - Cheapest-possible classification or routing — use
claude-haiku-4-5at one-quarter the price.
Next
- claude-opus-4-8 — flagship, 200K context
- claude-haiku-4-5 — small, fast, cheap