claude-sonnet-4-6
Capability: 200K context · tool use · vision · prompt caching · streaming
Pricing: per-token, Sonnet tier (live rate)
Sonnet 4.6 is what you should reach for first inside the 4-series. It
keeps the 200K context window and tool-use shape of 4.5, while
delivering measurably better structured output and tighter tool-call
arguments. If you’re starting a new project, default here. If you’re on
4.5, the migration is a single string change — most production code
sees a quality bump with no measurable latency or cost difference.
Request
Body parameters
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
model | string | yes | — | claude-sonnet-4-6 |
messages | array | yes | — | Conversation history. |
max_tokens | integer | yes | — | Hard cap. Max for this model: 16384. |
system | string | array | no | — | Array form supports cache_control. |
temperature | number | no | 1.0 | Range 0.0–1.0. |
top_p | number | no | 1.0 | Nucleus sampling. |
tools | array | no | — | Supported, including parallel tool calls. |
tool_choice | object | no | {"type":"auto"} | auto / any / tool (named). |
stream | boolean | no | false | SSE streaming. |
Response
Code examples
Streaming
Set"stream": true for SSE in the standard Anthropic format. Estimated
credits ship in the HTTP headers before the first event.
Cache control
cache_control blocks reduce cost on repeated prompts. Cache reads at
the discounted rate visible in the
pricing table. With Sonnet 4.6’s
tighter tool-arg generation, retrieval-augmented agent loops see the
biggest cache wins (system prompt + tool schema kept stable, only the
user turn changes).
Errors
| Code | Trigger | Billed? |
|---|---|---|
| 400 | Body validation failed | No |
| 401 | Missing / revoked key | No |
| 402 | Wallet exhausted | No |
| 403 | Scope denied / IP not allowlisted | No |
| 429 | Rate-limited | No |
| 5xx | Upstream provider issue | No (auto-retry envelope) |
When to use
- Default Anthropic mid-tier — start here, benchmark against Opus / Haiku later.
- Code generation / refactoring / structured extraction where Haiku is too imprecise.
- Tool-heavy agents where parallel tool calls and tight argument JSON matter.
- For the prior version with the same shape, see Sonnet 4.5.
- For higher throughput at lower quality, see Haiku 4.5.
- For deeper reasoning across long contexts, see Opus 4.7.
Limits
| Limit | Value |
|---|---|
| Context window | 200K tokens |
| Max output | 16384 tokens |
| Supports tool use | Yes (parallel) |
| Supports vision | Yes |
| Supports streaming | Yes |
| Supports prompt caching | Yes |