Skip to main content
Vendor: Anthropic Model ID: claude-opus-4-8 Capability: 200K context · tool use · vision · prompt caching · streaming · extended thinking Pricing: per-token, Opus tier (live rate) Opus 4.8 is the current flagship Anthropic model and the successor to Opus 4.7. It is the model you reach for when the one shot has to be right. It’s slower than Sonnet, more expensive than Sonnet, and noticeably better at the things Sonnet starts cutting corners on: long-context reasoning, multi-step plans where each step depends on the last, and the kind of code generation where the first draft has to compile and match the architecture conventions of an existing codebase. With extended thinking enabled, the response wait grows but the answer quality on hard problems jumps further than the latency cost suggests.

Request

curl https://llm.bytespike.ai/v1/messages \
  -H "x-api-key: $BYTESPIKE_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-opus-4-8",
    "max_tokens": 16384,
    "messages": [
      {"role": "user", "content": "Implement an LRU cache with O(1) get and put."}
    ]
  }'

Body parameters

FieldTypeRequiredDefaultNotes
modelstringyesclaude-opus-4-8
messagesarrayyesConversation history. Up to 200K tokens of input.
max_tokensintegeryesHard cap. Max for this model: 32768.
systemstring | arraynoArray form supports cache_control.
temperaturenumberno1.0Range 0.0–1.0.
top_pnumberno1.0Nucleus sampling.
toolsarraynoSupported, parallel calls supported.
tool_choiceobjectno{"type":"auto"}auto / any / tool (named).
thinkingobjectnoExtended-thinking. Higher budget = better long-reasoning answer at higher latency.
streambooleannofalseSSE streaming.

Response

{
  "id": "msg_opus_…",
  "type": "message",
  "role": "assistant",
  "model": "claude-opus-4-8",
  "content": [
    {"type": "thinking", "thinking": "<extended reasoning trace>"},
    {"type": "text", "text": "Here's the LRU cache..."}
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 32,
    "output_tokens": 1248,
    "thinking_tokens": 4032
  }
}
thinking_tokens are billed at the input-token rate (extended thinking adds latency but not the full output cost). See the pricing table for current rate.

Code examples

curl https://llm.bytespike.ai/v1/messages \
  -H "x-api-key: $BYTESPIKE_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-opus-4-8",
    "max_tokens": 16384,
    "messages": [{"role": "user", "content": "Implement an LRU cache with O(1) get and put."}]
  }'

Extended thinking

Opt in by setting the thinking block:
{
  "model": "claude-opus-4-8",
  "max_tokens": 16384,
  "thinking": {
    "type": "enabled",
    "budget_tokens": 8192
  },
  "messages": [...]
}
budget_tokens is the maximum number of internal-reasoning tokens. The model may use fewer; the floor is a few hundred. Recommended budgets:
TaskSuggested budget
Multi-step coding4K–8K
Long-context summarisation8K–16K
Hard math / proof16K–32K
Higher budgets monotonically improve answer quality on hard problems — but the marginal return falls off above 16K for most tasks.

Cache control

{
  "model": "claude-opus-4-8",
  "system": [
    {
      "type": "text",
      "text": "<the corpus you keep referring to>",
      "cache_control": {"type": "ephemeral"}
    }
  ],
  "messages": [...]
}
Cache reads at the discounted rate visible in the pricing table. On Opus 4.8, cache control is the single highest-leverage cost optimisation — large system prompts paid once, billed at the cache-read rate on every subsequent turn.

Errors

CodeTriggerBilled?
400Body validation failedNo
401Missing / revoked keyNo
402Wallet exhausted (Opus calls trip this faster)No
413Input exceeds 200K tokensNo
429Rate-limitedNo
5xxUpstream provider issueNo (auto-retry envelope)

When to use

  • One-shot quality matters and you can wait for a thoughtful answer.
  • Code generation in an existing codebase where conventions matter.
  • Multi-step plans where each step depends on the last (Sonnet starts skipping; Opus 4.8 keeps the chain tight).
  • Long-context reasoning across legal / medical / technical corpora within the 200K window.
  • For mid-tier cost / latency, see Sonnet 4.6.
  • For high-throughput agent loops, see Haiku 4.5.

Limits

LimitValue
Context window200K tokens
Max output32768 tokens
Supports tool useYes (parallel)
Supports visionYes
Supports streamingYes
Supports prompt cachingYes
Supports extended thinkingYes