Skip to main content
Vendor: OpenAI Model ID: gpt-5-mini Capability: 128K context · tool use · vision · streaming · structured output Pricing: per-token, mini tier (live rate) GPT-5-mini is the small model that displaced GPT-4o-mini for new production work. Same price tier, measurably tighter structured output, better tool-call argument generation. For most extraction and routing flows, this is the starting point — only step up to a standard 5-series model when you’ve benchmark-confirmed that mini’s quality plateau is the bottleneck.

Request

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-5-mini",
    "messages": [
      {"role": "user", "content": "Extract the dates from: The deal closes 2024-08-12 with backup date 2024-09-01."}
    ],
    "response_format": {"type": "json_object"}
  }'

Body parameters

FieldTypeRequiredDefaultNotes
modelstringyesgpt-5-mini
messagesarrayyesOpenAI chat shape.
max_tokensintegernomodel maxMax: 16384.
temperaturenumberno1.0Range 0.0–2.0.
toolsarraynoFunction calling supported (parallel).
tool_choicestring | objectno"auto"
response_formatobjectnoJSON mode + structured output (recommended for extraction).
streambooleannofalseSSE streaming.

Response

{
  "id": "chatcmpl-…",
  "object": "chat.completion",
  "model": "gpt-5-mini",
  "choices": [{"index": 0, "message": {"role": "assistant", "content": "{\"dates\": [\"2024-08-12\", \"2024-09-01\"]}"}, "finish_reason": "stop"}],
  "usage": {"prompt_tokens": 32, "completion_tokens": 18, "total_tokens": 50}
}

Code examples

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-5-mini",
    "messages": [{"role": "user", "content": "Extract the dates as JSON."}],
    "response_format": {"type": "json_object"}
  }'

Streaming + caching

"stream": true for SSE. Automatic prompt caching — keep system prompt and tool schema stable for max cache hit rate.

Errors

CodeTriggerBilled?
400Body validationNo
401Missing / revoked keyNo
402Wallet exhaustedNo
422Param not supportedNo
429Rate-limitedNo
5xxUpstream provider issueNo

When to use

  • Production extraction / structured output / routing.
  • Lightweight agent steps (one tool call per step).
  • For higher quality, see GPT-5.5 or GPT-5.4.
  • For lower latency, see GPT-5-nano.
  • For 5.4-era mini, see GPT-5.4-mini.

Limits

LimitValue
Context window128K tokens
Max output16384 tokens
Supports tool useYes (parallel)
Supports visionYes
Supports streamingYes
Supports prompt cachingAutomatic
Supports structured outputYes