gpt-5-4-mini is the small-mid GPT-5. It keeps the GPT-5 reasoning capability but at roughly one-third the per-token cost of gpt-5-4. The right default for high-volume traffic where you want better-than-Haiku quality without paying flagship rates.
Pricing: 4.50 / 1M output, $0.075 / 1M cache read — see the rate card.
Protocols
| Protocol | Path |
|---|---|
| OpenAI Chat Completions | POST https://llm.bytespike.ai/v1/chat/completions |
| OpenAI Responses | POST https://llm.bytespike.ai/v1/responses |
Quickstart
Capabilities
| Capability | Supported |
|---|---|
| Chat Completions | ✅ |
| Responses API | ✅ |
| Streaming (SSE) | ✅ |
| Vision | ✅ |
| Tool use | ✅ parallel |
| JSON mode | ✅ |
| Structured outputs | ✅ |
| Reasoning effort | ✅ (low / medium) |
| Web search | — |
| Context window | 128K tokens |
When to use
- High-volume classification, routing, structured extraction — anywhere you’d use Haiku for cost but want GPT-flavored reasoning.
- Short-prompt agent loops — tool-use chains where each step is small and the gate is throughput.
- Cheaper Codex-style via Responses API at
reasoning_effort: "low".
- Hard reasoning that benefits from
reasoning_effort: "high"— go togpt-5-4orgpt-5-4-pro. - Web search needed — only
gpt-5-2and up have it.
Next
- gpt-5-4 — production workhorse
- gpt-5-4-nano — smallest GPT-5.4 tier
- gpt-5-5 — current GPT-5 flagship