claude-opus-4-8
Capability: 200K context · tool use · vision · prompt caching · streaming · extended thinking
Pricing: per-token, Opus tier (live rate)
Opus 4.8 is the current flagship Anthropic model and the successor to Opus 4.7. It is the model you reach for when the one shot has to be right.
It’s slower than Sonnet, more expensive than Sonnet, and noticeably
better at the things Sonnet starts cutting corners on: long-context
reasoning, multi-step plans where each step depends on the last, and
the kind of code generation where the first draft has to compile and
match the architecture conventions of an existing codebase. With
extended thinking enabled, the response wait grows but the answer
quality on hard problems jumps further than the latency cost suggests.
Request
Body parameters
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
model | string | yes | — | claude-opus-4-8 |
messages | array | yes | — | Conversation history. Up to 200K tokens of input. |
max_tokens | integer | yes | — | Hard cap. Max for this model: 32768. |
system | string | array | no | — | Array form supports cache_control. |
temperature | number | no | 1.0 | Range 0.0–1.0. |
top_p | number | no | 1.0 | Nucleus sampling. |
tools | array | no | — | Supported, parallel calls supported. |
tool_choice | object | no | {"type":"auto"} | auto / any / tool (named). |
thinking | object | no | — | Extended-thinking. Higher budget = better long-reasoning answer at higher latency. |
stream | boolean | no | false | SSE streaming. |
Response
thinking_tokens are billed at the input-token rate (extended thinking
adds latency but not the full output cost). See the
pricing table for current rate.
Code examples
Extended thinking
Opt in by setting thethinking block:
budget_tokens is the maximum number of internal-reasoning tokens. The
model may use fewer; the floor is a few hundred. Recommended budgets:
| Task | Suggested budget |
|---|---|
| Multi-step coding | 4K–8K |
| Long-context summarisation | 8K–16K |
| Hard math / proof | 16K–32K |
Cache control
Errors
| Code | Trigger | Billed? |
|---|---|---|
| 400 | Body validation failed | No |
| 401 | Missing / revoked key | No |
| 402 | Wallet exhausted (Opus calls trip this faster) | No |
| 413 | Input exceeds 200K tokens | No |
| 429 | Rate-limited | No |
| 5xx | Upstream provider issue | No (auto-retry envelope) |
When to use
- One-shot quality matters and you can wait for a thoughtful answer.
- Code generation in an existing codebase where conventions matter.
- Multi-step plans where each step depends on the last (Sonnet starts skipping; Opus 4.8 keeps the chain tight).
- Long-context reasoning across legal / medical / technical corpora within the 200K window.
- For mid-tier cost / latency, see Sonnet 4.6.
- For high-throughput agent loops, see Haiku 4.5.
Limits
| Limit | Value |
|---|---|
| Context window | 200K tokens |
| Max output | 32768 tokens |
| Supports tool use | Yes (parallel) |
| Supports vision | Yes |
| Supports streaming | Yes |
| Supports prompt caching | Yes |
| Supports extended thinking | Yes |