Skip to main content
Vendor: OpenAI Model ID: gpt-5-4 Capability: 128K context · tool use · vision · streaming · structured output Pricing: per-token, standard tier (live rate) GPT-5.4 is the workhorse of the 5.4 wave — better tool-call argument generation than 5.2, tighter structured output, same 128K context. Production default for any team that needs more than mini quality but doesn’t want the 5.5 latency premium. For multi-step reasoning, see GPT-5.4-pro.

Request

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-5-4",
    "messages": [{"role": "user", "content": "Refactor this React component to use hooks."}]
  }'

Body parameters

FieldTypeRequiredDefaultNotes
modelstringyesgpt-5-4
messagesarrayyes
max_tokensintegernomodel maxMax: 16384.
temperaturenumberno1.0
toolsarraynoParallel function calling.
response_formatobjectnoJSON / structured output.
streambooleannofalseSSE streaming.

Response

{
  "id": "chatcmpl-…",
  "object": "chat.completion",
  "model": "gpt-5-4",
  "choices": [{"index": 0, "message": {"role": "assistant", "content": "..."}, "finish_reason": "stop"}],
  "usage": {"prompt_tokens": 412, "completion_tokens": 587, "total_tokens": 999}
}

Code examples

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{"model": "gpt-5-4", "messages": [{"role": "user", "content": "Refactor this React component."}]}'

Streaming + caching

"stream": true for SSE. Automatic prompt caching for repeated prefixes — biggest cost win on long system prompts.

Errors

CodeTriggerBilled?
400 / 401 / 402 / 422 / 429StandardNo
5xxUpstreamNo (auto-retry)

When to use

  • Default production model for code generation, content rewriting, and tool-using agents.
  • For multi-step reasoning where each step compounds, see GPT-5.4-pro.
  • For the latest flagship, see GPT-5.5.
  • For lower cost, see GPT-5.4-mini.
  • For a faster but slightly older mid-tier, see GPT-5.2.

Limits

LimitValue
Context window128K tokens
Max output16384 tokens
Supports tool useYes (parallel)
Supports visionYes
Supports streamingYes
Supports prompt cachingAutomatic