Skip to main content
Vendor: Anthropic Model ID: claude-sonnet-4-6 Capability: 200K context · tool use · vision · prompt caching · streaming Pricing: per-token, Sonnet tier (live rate) Sonnet 4.6 is what you should reach for first inside the 4-series. It keeps the 200K context window and tool-use shape of 4.5, while delivering measurably better structured output and tighter tool-call arguments. If you’re starting a new project, default here. If you’re on 4.5, the migration is a single string change — most production code sees a quality bump with no measurable latency or cost difference.

Request

curl https://llm.bytespike.ai/v1/messages \
  -H "x-api-key: $BYTESPIKE_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 4096,
    "messages": [
      {"role": "user", "content": "Extract the dates from this paragraph as JSON."}
    ]
  }'

Body parameters

FieldTypeRequiredDefaultNotes
modelstringyesclaude-sonnet-4-6
messagesarrayyesConversation history.
max_tokensintegeryesHard cap. Max for this model: 16384.
systemstring | arraynoArray form supports cache_control.
temperaturenumberno1.0Range 0.0–1.0.
top_pnumberno1.0Nucleus sampling.
toolsarraynoSupported, including parallel tool calls.
tool_choiceobjectno{"type":"auto"}auto / any / tool (named).
streambooleannofalseSSE streaming.

Response

{
  "id": "msg_sonnet_…",
  "type": "message",
  "role": "assistant",
  "model": "claude-sonnet-4-6",
  "content": [
    {"type": "text", "text": "[\"2024-08-12\", \"2024-09-01\"]"}
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 87,
    "output_tokens": 24
  }
}

Code examples

curl https://llm.bytespike.ai/v1/messages \
  -H "x-api-key: $BYTESPIKE_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 4096,
    "messages": [{"role": "user", "content": "Extract the dates from this paragraph as JSON."}]
  }'

Streaming

Set "stream": true for SSE in the standard Anthropic format. Estimated credits ship in the HTTP headers before the first event.

Cache control

cache_control blocks reduce cost on repeated prompts. Cache reads at the discounted rate visible in the pricing table. With Sonnet 4.6’s tighter tool-arg generation, retrieval-augmented agent loops see the biggest cache wins (system prompt + tool schema kept stable, only the user turn changes).
{
  "model": "claude-sonnet-4-6",
  "system": [
    {
      "type": "text",
      "text": "<long static system prompt>",
      "cache_control": {"type": "ephemeral"}
    }
  ],
  "messages": [...]
}

Errors

CodeTriggerBilled?
400Body validation failedNo
401Missing / revoked keyNo
402Wallet exhaustedNo
403Scope denied / IP not allowlistedNo
429Rate-limitedNo
5xxUpstream provider issueNo (auto-retry envelope)

When to use

  • Default Anthropic mid-tier — start here, benchmark against Opus / Haiku later.
  • Code generation / refactoring / structured extraction where Haiku is too imprecise.
  • Tool-heavy agents where parallel tool calls and tight argument JSON matter.
  • For the prior version with the same shape, see Sonnet 4.5.
  • For higher throughput at lower quality, see Haiku 4.5.
  • For deeper reasoning across long contexts, see Opus 4.7.

Limits

LimitValue
Context window200K tokens
Max output16384 tokens
Supports tool useYes (parallel)
Supports visionYes
Supports streamingYes
Supports prompt cachingYes