Skip to main content
gpt-5-4-mini is the small-mid GPT-5. It keeps the GPT-5 reasoning capability but at roughly one-third the per-token cost of gpt-5-4. The right default for high-volume traffic where you want better-than-Haiku quality without paying flagship rates. Pricing: 0.75/1Minput,0.75 / 1M input, 4.50 / 1M output, $0.075 / 1M cache read — see the rate card.

Protocols

ProtocolPath
OpenAI Chat CompletionsPOST https://llm.bytespike.ai/v1/chat/completions
OpenAI ResponsesPOST https://llm.bytespike.ai/v1/responses

Quickstart

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-5-4-mini",
    "reasoning_effort": "low",
    "messages": [
      { "role": "user", "content": "Classify: refund / billing / technical / other.\n\nMy invoice charged twice." }
    ]
  }'

Capabilities

CapabilitySupported
Chat Completions
Responses API
Streaming (SSE)
Vision
Tool use✅ parallel
JSON mode
Structured outputs
Reasoning effort✅ (low / medium)
Web search
Context window128K tokens

When to use

  • High-volume classification, routing, structured extraction — anywhere you’d use Haiku for cost but want GPT-flavored reasoning.
  • Short-prompt agent loops — tool-use chains where each step is small and the gate is throughput.
  • Cheaper Codex-style via Responses API at reasoning_effort: "low".
When not to use:
  • Hard reasoning that benefits from reasoning_effort: "high" — go to gpt-5-4 or gpt-5-4-pro.
  • Web search needed — only gpt-5-2 and up have it.

Next