Skip to main content
claude-haiku-4-5 is the small / fast / cheap Claude. It’s the right default for anything you’d run at scale where the prompt is short, the response is short, and you’d rather call the API ten thousand times an hour than three hundred. Vision and tool use are still included; what you give up is the bigger models’ reasoning depth on hard problems. Pricing: 1.00/1Minput,1.00 / 1M input, 5.00 / 1M output, $0.10 / 1M cache read — see the rate card.

Protocols

ProtocolPath
Anthropic MessagesPOST https://llm.bytespike.ai/v1/messages
OpenAI Chat CompletionsPOST https://llm.bytespike.ai/v1/chat/completions

Quickstart

curl https://llm.bytespike.ai/v1/messages \
  -H "x-api-key: $BYTESPIKE_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-haiku-4-5",
    "max_tokens": 256,
    "messages": [
      { "role": "user", "content": "Classify: refund / billing / technical / other.\n\nMy invoice charged twice." }
    ]
  }'

Capabilities

CapabilitySupported
Chat completions
Streaming (SSE)
Vision (image input)
Tool use (function calling)✅ parallel
Prompt caching (cache_control)
Extended thinking
Web search
JSON / structured output
Context window200K tokens

When to use

  • Classification, routing, triage. Tickets → categories. Emails → priority. Calls → next-best action.
  • Structured extraction at scale. PII redaction, entity extraction, schema-driven parsing on a firehose.
  • Cache-amortized agents. Big system prompt + many short user turns; cache_control on the system makes each turn ~10× cheaper after the first.
  • Vision OCR on cheap models. Haiku’s vision is good enough for receipts, invoices, screenshots — at a quarter of Sonnet’s price.
When not to use:
  • Hard reasoning — no extended thinking on Haiku; go to Sonnet or Opus.
  • Long-form writing — Haiku’s prose tier is below Sonnet.

Next