GPT-5.4 - ByteSpike

Vendor: OpenAI Model ID: gpt-5-4 Capability: 128K context · tool use · vision · streaming · structured output Pricing: per-token, standard tier (live rate) GPT-5.4 is the workhorse of the 5.4 wave — better tool-call argument generation than 5.2, tighter structured output, same 128K context. Production default for any team that needs more than mini quality but doesn’t want the 5.5 latency premium. For multi-step reasoning, see GPT-5.4-pro.

Request

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-5-4",
    "messages": [{"role": "user", "content": "Refactor this React component to use hooks."}]
  }'

Body parameters

Field	Type	Required	Default	Notes
`model`	string	yes	—	`gpt-5-4`
`messages`	array	yes	—	—
`max_tokens`	integer	no	model max	Max: 16384.
`temperature`	number	no	1.0	—
`tools`	array	no	—	Parallel function calling.
`response_format`	object	no	—	JSON / structured output.
`stream`	boolean	no	false	SSE streaming.

Response

{
  "id": "chatcmpl-…",
  "object": "chat.completion",
  "model": "gpt-5-4",
  "choices": [{"index": 0, "message": {"role": "assistant", "content": "..."}, "finish_reason": "stop"}],
  "usage": {"prompt_tokens": 412, "completion_tokens": 587, "total_tokens": 999}
}

Code examples

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{"model": "gpt-5-4", "messages": [{"role": "user", "content": "Refactor this React component."}]}'

Streaming + caching

"stream": true for SSE. Automatic prompt caching for repeated prefixes — biggest cost win on long system prompts.

Errors

Code	Trigger	Billed?
400 / 401 / 402 / 422 / 429	Standard	No
5xx	Upstream	No (auto-retry)

When to use

Default production model for code generation, content rewriting, and tool-using agents.
For multi-step reasoning where each step compounds, see GPT-5.4-pro.
For the latest flagship, see GPT-5.5.
For lower cost, see GPT-5.4-mini.
For a faster but slightly older mid-tier, see GPT-5.2.

Limits

Limit	Value
Context window	128K tokens
Max output	16384 tokens
Supports tool use	Yes (parallel)
Supports vision	Yes
Supports streaming	Yes
Supports prompt caching	Automatic

GPT-5.4 mini GPT-5.4 pro

​Request

​Body parameters

​Response

​Code examples

​Streaming + caching

​Errors

​When to use

​Limits