Skip to main content
Vendor: Anthropic Model ID: claude-sonnet-4-5 Capability: 200K context · tool use · vision · prompt caching · streaming Pricing: per-token, Sonnet tier (live rate) Sonnet 4.5 is the workhorse model of the 4-series. If your prompt fits in 200K tokens, you want native tool use, and you’d rather spend money on more calls than wait for one Opus round-trip, this is the default. For most production agent flows, swapping Opus 4.7 → Sonnet 4.5 is a ~3× cost reduction with a much smaller quality drop than the price gap suggests — measure both before assuming Opus.

Request

curl https://llm.bytespike.ai/v1/messages \
  -H "x-api-key: $BYTESPIKE_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 4096,
    "messages": [
      {"role": "user", "content": "Refactor this Python function for readability."}
    ]
  }'

Body parameters

FieldTypeRequiredDefaultNotes
modelstringyesclaude-sonnet-4-5
messagesarrayyesConversation history.
max_tokensintegeryesHard cap. Max for this model: 16384.
systemstring | arraynoArray form supports cache_control.
temperaturenumberno1.0Range 0.0–1.0.
top_pnumberno1.0Nucleus sampling.
toolsarraynoSupported, including parallel tool calls.
tool_choiceobjectno{"type":"auto"}auto / any / tool (named).
streambooleannofalseSSE streaming.

Response

{
  "id": "msg_sonnet_…",
  "type": "message",
  "role": "assistant",
  "model": "claude-sonnet-4-5",
  "content": [
    {"type": "text", "text": "Here's the refactor..."}
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 142,
    "output_tokens": 318
  }
}

Code examples

curl https://llm.bytespike.ai/v1/messages \
  -H "x-api-key: $BYTESPIKE_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 4096,
    "messages": [{"role": "user", "content": "Refactor this Python function for readability."}]
  }'

Streaming

Set "stream": true for SSE in the standard Anthropic format. Estimated credits ship in the HTTP headers before the first event.

Cache control

cache_control blocks pay for themselves on Sonnet within ~3 repeated calls of the same system prompt. Cache reads bill at the discounted rate visible in the pricing table.
{
  "model": "claude-sonnet-4-5",
  "system": [
    {
      "type": "text",
      "text": "<long static system prompt>",
      "cache_control": {"type": "ephemeral"}
    }
  ],
  "messages": [...]
}

Errors

CodeTriggerBilled?
400Body validation failedNo
401Missing / revoked keyNo
402Wallet exhaustedNo
403Scope denied / IP not allowlistedNo
429Rate-limitedNo
5xxUpstream provider issueNo (auto-retry envelope)

When to use

  • Production agent loops where one-shot quality matters and you can wait 1–2s.
  • Code review / refactoring / structured output where Haiku starts skipping steps.
  • For higher throughput at lower quality, see Haiku 4.5.
  • For deeper reasoning across long contexts, see Opus 4.7.
  • The newer Sonnet 4.6 is now the recommended Sonnet — keep 4.5 only if you’ve benchmark-verified the older version on your task.

Limits

LimitValue
Context window200K tokens
Max output16384 tokens
Supports tool useYes (parallel)
Supports visionYes
Supports streamingYes
Supports prompt cachingYes