Skip to main content
Vendor: OpenAI Model ID: gpt-5-4-pro Capability: 128K context · tool use · vision · streaming · structured output · reasoning_effort Pricing: per-token, pro tier (live rate) GPT-5.4-pro takes the standard 5.4 base and exposes a reasoning_effort dial for OpenAI’s reasoning chain. Reach for it when the problem is multi-step — code generation in an existing codebase with conventions, math / proof tasks, planning across many sub-goals — and the first draft has to compose, not just answer. The latency cost rises with reasoning effort but the answer quality on hard problems jumps further than the latency suggests.

Request

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-5-4-pro",
    "reasoning_effort": "high",
    "messages": [{"role": "user", "content": "Implement a thread-safe LRU cache in Rust."}]
  }'

Body parameters

FieldTypeRequiredDefaultNotes
modelstringyesgpt-5-4-pro
messagesarrayyes
reasoning_effortstringno"medium""low" / "medium" / "high" — higher = longer reasoning chain, higher latency, higher quality on hard problems.
max_tokensintegernomodel maxMax: 32768.
toolsarraynoParallel function calling.
response_formatobjectnoJSON / structured output.
streambooleannofalseSSE streaming.

Response

{
  "id": "chatcmpl-…",
  "object": "chat.completion",
  "model": "gpt-5-4-pro",
  "choices": [{"index": 0, "message": {"role": "assistant", "content": "use std::sync::..."}, "finish_reason": "stop"}],
  "usage": {
    "prompt_tokens": 32,
    "completion_tokens": 1248,
    "reasoning_tokens": 4032,
    "total_tokens": 5312
  }
}
reasoning_tokens are billed at the input-token rate, not output.

Code examples

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{"model": "gpt-5-4-pro", "reasoning_effort": "high", "messages": [{"role": "user", "content": "..."}]}'

Reasoning effort guide

SettingUse for
"low"Fast structured output, light reasoning
"medium"Default — most multi-step tasks
"high"Hard math / proof / multi-goal planning where you can wait
Higher settings monotonically improve quality on hard problems; the marginal return falls off above "high" for most tasks.

Streaming + caching

"stream": true for SSE. With reasoning enabled, the first response chunk lands after the reasoning chain completes — there’s a longer HTTP TTFB than non-reasoning models. Automatic prompt caching applies.

Errors

CodeTriggerBilled?
400 / 401 / 402 / 422 / 429StandardNo
5xxUpstreamNo (auto-retry)

When to use

  • Multi-step coding in an existing codebase.
  • Hard math / proof / planning tasks.
  • For non-reasoning 5.4 at lower latency, see GPT-5.4.
  • For the latest reasoning-capable flagship, see GPT-5.5.

Limits

LimitValue
Context window128K tokens
Max output32768 tokens
Supports tool useYes (parallel)
Supports visionYes
Supports streamingYes
Supports prompt cachingAutomatic
Supports reasoning_effortYes