Skip to main content
The Responses API shape (introduced by OpenAI for o1 / o3 / GPT-5 + Agents SDK) — billed and served by ByteSpike like any other endpoint. Under the hood the gateway translates Responses-shaped requests into each model’s native protocol, so you can point your OpenAI SDK + Agents SDK + Codex CLI at ByteSpike and have it work against Claude, Gemini, DeepSeek, etc.

When to use

  • OpenAI Agents SDK — the SDK speaks Responses-shape by default
  • Codex CLI — uses Responses internally; ByteSpike is a drop-in via --responses-base-url https://llm.bytespike.ai/v1
  • Multi-turn agents with structured outputsresponse_format
    • reasoning blocks ship in the same envelope
  • Cross-model parity — you want a single client shape that reaches the o-series, GPT-5, and the Claude family
For straight chat without reasoning steps, prefer /chat/completions. For Anthropic-native tool_use + cache_control, use /messages.

Request

curl https://llm.bytespike.ai/v1/responses \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-5-5",
    "input": "What is the capital of France?"
  }'

Headers

HeaderRequiredNotes
AuthorizationyesBearer sk-byts-… — the only auth shape Responses accepts.
content-typeyesapplication/json.

Body (selected fields)

The Responses API surface is large; ByteSpike forwards the full schema to whichever model you request. Common fields:
FieldTypeRequiredNotes
modelstringyesCatalog slug (gpt-5-5, claude-opus-4-8, gpt-5-4-mini, …).
inputstring | arrayyesThe conversation. String = single user message; array = multi-turn (role/content shapes match OpenAI’s Responses spec).
instructionsstringnoThe “system prompt” position.
toolsarraynoTool definitions in Responses shape ({type, name, parameters, …}).
tool_choicestring | objectnoauto / none / required / {type: "function", function: {name}}.
temperaturenumberno1.0 default.
top_pnumbernoNucleus sampling.
max_output_tokensintegernoCap on output token count.
reasoningobjectno{"effort": "low" | "medium" | "high"} — for o-series and GPT-5 reasoning models.
response_formatobjectno{"type": "json_object"} or {"type": "json_schema", "json_schema": {...}} for strict structured output.
streambooleannoSSE delivery. Same event protocol as the OpenAI Responses spec.
metadataobjectno{"user_id": "..."} — forwarded to the model and logged.
previous_response_idstringnoMulti-turn chaining; references a prior response.id.

Response

{
  "id": "resp_01HQX9F2P6Y8VEX3CRZ8GXJVD9",
  "object": "response",
  "created_at": 1716700000,
  "model": "gpt-5-5",
  "status": "completed",
  "output": [
    {
      "type": "message",
      "id": "msg_01ABC",
      "role": "assistant",
      "content": [
        {"type": "output_text", "text": "Paris."}
      ]
    }
  ],
  "usage": {
    "input_tokens": 9,
    "output_tokens": 2,
    "total_tokens": 11
  }
}

Response fields

FieldTypeNotes
idstringServer-generated id, prefix resp_. Use in previous_response_id for chaining.
statusstringcompleted, in_progress, failed. Always completed on a non-streaming success.
outputarrayArray of output items. Each item has a typemessage, reasoning, tool_call, etc.
usage.input_tokensintegerTokens billed for input.
usage.output_tokensintegerTokens billed for output (includes reasoning tokens for o-series).
usage.reasoning_tokensintegerSubset of output_tokens spent on hidden reasoning, when applicable.

Accounting headers

Same quota envelope as every other endpoint:
X-RateLimit-Limit: 50.00
X-RateLimit-Remaining: 42.18
X-RateLimit-Reset: 1716705600
X-Quota-Remaining-Credits: 192.40
Full breakdown in the API Reference overview.

Streaming

Pass "stream": true:
curl -N https://llm.bytespike.ai/v1/responses \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-5-5",
    "input": "Tell a 3-sentence story.",
    "stream": true
  }'
The SSE stream is byte-for-byte compatible with the OpenAI Responses streaming protocol — response.created, response.output_text.delta, response.completed, etc.

Cross-model routing

The same Responses request shape works against every model in our catalog. Pick by the model field:
Model familyExample slugsNotes
GPT-5 / o-seriesgpt-5-5, gpt-5-4, gpt-5-4-miniNative Responses shape — no translation.
Claudeclaude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5reasoning maps to Claude’s thinking blocks.
Geminigemini-3-1-pro, gemini-2-5-flashFull Responses surface supported.
Translated paths preserve tool_use / tool_calls, structured outputs (response_format), and reasoning tokens. The output shape you receive is always Responses-API regardless of the model.

Errors

OpenAI envelope (matches the OpenAI Responses spec):
{
  "error": {
    "message": "Model \"foo-3\" not found",
    "type": "invalid_request_error",
    "code": "model_not_found"
  }
}
HTTPcodeWhen
400invalid_request_errorBad JSON, missing model, malformed tools.
401authentication_errorMissing / revoked key.
402insufficient_balanceAccount is out of credits. Top up at console.bytespike.ai.
413invalid_request_error (body too large)Prompt + tool schema exceeds the body cap.
429rate_limit_exceededSee the rate-limit chapter.
502 / 503api_errorTransient timeout or no capacity in the routing group for this model. The dial-test in Console → Models helps debug.

Pricing

Per-1k tokens. Same rate as the model’s native endpoint — translation overhead is not billed. Live card: bytespike.ai/pricing.