Skip to main content
Vendor: OpenAI Model ID: gpt-5-nano Capability: 128K context · tool use · vision · streaming Pricing: per-token, nano tier (live rate) GPT-5-nano is the latency floor of the 5-series. Reach for it when the plan is to make many cheap LLM calls — sub-LLM judges, classifier chains, agent loops where each step is a routing decision rather than an answer. The quality gap to mini is small for bounded prompts and the latency win is significant.

Request

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-5-nano",
    "messages": [
      {"role": "user", "content": "Route to: support / billing / sales. Input: My card was charged twice."}
    ]
  }'

Body parameters

FieldTypeRequiredDefaultNotes
modelstringyesgpt-5-nano
messagesarrayyesOpenAI chat shape.
max_tokensintegernomodel maxMax: 8192.
temperaturenumberno1.0Range 0.0–2.0.
toolsarraynoFunction calling supported.
response_formatobjectnoJSON mode + structured output supported.
streambooleannofalseSSE streaming.

Response

{
  "id": "chatcmpl-…",
  "object": "chat.completion",
  "model": "gpt-5-nano",
  "choices": [{"index": 0, "message": {"role": "assistant", "content": "billing"}, "finish_reason": "stop"}],
  "usage": {"prompt_tokens": 28, "completion_tokens": 1, "total_tokens": 29}
}

Code examples

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-5-nano",
    "messages": [{"role": "user", "content": "Route to: support / billing / sales. Input: My card was charged twice."}]
  }'

Streaming + caching

"stream": true for SSE. Automatic prompt caching on stable prefixes. On nano, cache hits matter less than on larger models because the input rate is already low — but they still help on long system prompts.

Errors

CodeTriggerBilled?
400Body validationNo
401Missing / revoked keyNo
402Wallet exhaustedNo
422Param not supportedNo
429Rate-limitedNo
5xxUpstream provider issueNo

When to use

  • Routing / triage / classification at the head of an agent pipeline.
  • High-volume sub-LLM judges.
  • For more capability at slightly higher latency, see GPT-5-mini.
  • For 5.4-tier routing, see GPT-5.4-nano.

Limits

LimitValue
Context window128K tokens
Max output8192 tokens
Supports tool useYes
Supports visionYes
Supports streamingYes
Supports prompt cachingAutomatic