Skip to main content
claude-sonnet-4-6 is the production workhorse of the Claude family — the model most ByteSpike customers default to when they need a model that does everything at a mid-tier price. Vision, tools, prompt caching, extended thinking, and web search are all available at the same 200K context as Opus; Opus reserves the top of the quality envelope for hard reasoning. Pricing: 3.00/1Minput,3.00 / 1M input, 15.00 / 1M output, $0.30 / 1M cache read — see the rate card.

Protocols

ProtocolPath
Anthropic MessagesPOST https://llm.bytespike.ai/v1/messages
OpenAI Chat CompletionsPOST https://llm.bytespike.ai/v1/chat/completions
The two protocols produce equivalent responses for the same input; pick whichever your client already speaks.

Quickstart

curl https://llm.bytespike.ai/v1/messages \
  -H "x-api-key: $BYTESPIKE_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 1024,
    "messages": [
      { "role": "user", "content": "Hello, ByteSpike." }
    ]
  }'

Capabilities

CapabilitySupported
Chat completions
Streaming (SSE)
Vision (image input)
Tool use (function calling)✅ parallel
Prompt caching (cache_control)
Extended thinking
Web search (web_search tool)
JSON / structured output
Context window200K tokens

When to use

  • Default mid-tier. When you need a model that does everything — vision + tools + thinking + web search — at Sonnet pricing.
  • Agents. Use the Anthropic Messages endpoint; the tool_use and thinking blocks pass through transparently.
  • RAG. cache_control on a long system prompt amortizes the cost across many user turns. The first request pays the cache-write tier; subsequent reads from the same prefix pay $0.30/1M (10% of input).
  • Fresh-fact tasks. Add web_search to the tools array.
When not to use:
  • Hardest reasoning — go to claude-opus-4-8, the Claude flagship.
  • Cheapest-possible classification or routing — use claude-haiku-4-5 at one-quarter the price.

Next