Skip to main content
Vendor: OpenAI Model ID: gpt-5-5-instant Capability: 128K context · tool use · vision · streaming · structured output Pricing: per-token, flagship tier (live rate) GPT-5.5-instant is GPT-5.5 with the reasoning chain elided — the model goes straight from prompt to answer. For shorter prompts and tasks where the reasoning was overkill (extraction, structured output, short-form rewriting), this is the right pick: same flagship quality on bounded inputs, sub-second TTFB.

Request

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-5-5-instant",
    "messages": [{"role": "user", "content": "Rewrite this support response in a friendlier tone."}]
  }'

Body parameters

FieldTypeRequiredDefaultNotes
modelstringyesgpt-5-5-instant
messagesarrayyes
max_tokensintegernomodel maxMax: 16384.
toolsarraynoParallel function calling.
response_formatobjectnoJSON / structured output.
streambooleannofalseSSE streaming.
reasoning_effort is not accepted on this variant — passing it returns 422.

Response

{
  "id": "chatcmpl-…",
  "object": "chat.completion",
  "model": "gpt-5-5-instant",
  "choices": [{"index": 0, "message": {"role": "assistant", "content": "..."}, "finish_reason": "stop"}],
  "usage": {"prompt_tokens": 87, "completion_tokens": 142, "total_tokens": 229}
}
No reasoning_tokens field — there’s no reasoning chain.

Code examples

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{"model": "gpt-5-5-instant", "messages": [{"role": "user", "content": "Rewrite this in a friendlier tone."}]}'

Streaming + caching

"stream": true for SSE. Sub-second TTFB on most prompts because the reasoning chain is elided. Automatic prompt caching on stable prefixes.

Errors

CodeTriggerBilled?
400 / 401 / 402 / 429StandardNo
422Including reasoning_effort (use GPT-5.5 for reasoning)No
5xxUpstreamNo (auto-retry)

When to use

  • Latency-critical surfaces where reasoning is overkill.
  • Short-form rewriting, extraction, structured output on bounded inputs.
  • For deeper multi-step reasoning, use GPT-5.5 with reasoning_effort.
  • For mini-tier cost on latency-bound work, see GPT-5.4-mini.

Limits

LimitValue
Context window128K tokens
Max output16384 tokens
Supports tool useYes (parallel)
Supports visionYes
Supports streamingYes
Supports prompt cachingAutomatic
Supports reasoning_effortNo