claude-haiku-4-5 is the small / fast / cheap Claude. It’s the right default for anything you’d run at scale where the prompt is short, the response is short, and you’d rather call the API ten thousand times an hour than three hundred. Vision and tool use are still included; what you give up is the bigger models’ reasoning depth on hard problems.
Pricing: 5.00 / 1M output, $0.10 / 1M cache read — see the rate card.
Protocols
| Protocol | Path |
|---|---|
| Anthropic Messages | POST https://llm.bytespike.ai/v1/messages |
| OpenAI Chat Completions | POST https://llm.bytespike.ai/v1/chat/completions |
Quickstart
Capabilities
| Capability | Supported |
|---|---|
| Chat completions | ✅ |
| Streaming (SSE) | ✅ |
| Vision (image input) | ✅ |
| Tool use (function calling) | ✅ parallel |
| Prompt caching (cache_control) | ✅ |
| Extended thinking | — |
| Web search | — |
| JSON / structured output | ✅ |
| Context window | 200K tokens |
When to use
- Classification, routing, triage. Tickets → categories. Emails → priority. Calls → next-best action.
- Structured extraction at scale. PII redaction, entity extraction, schema-driven parsing on a firehose.
- Cache-amortized agents. Big system prompt + many short user turns; cache_control on the system makes each turn ~10× cheaper after the first.
- Vision OCR on cheap models. Haiku’s vision is good enough for receipts, invoices, screenshots — at a quarter of Sonnet’s price.
- Hard reasoning — no extended thinking on Haiku; go to Sonnet or Opus.
- Long-form writing — Haiku’s prose tier is below Sonnet.
Next
- claude-sonnet-4-6 — production mid-tier
- claude-opus-4-8 — 200K-context flagship