When to use
- OpenAI Agents SDK — the SDK speaks Responses-shape by default
- Codex CLI — uses Responses internally; ByteSpike is a drop-in
via
--responses-base-url https://llm.bytespike.ai/v1 - Multi-turn agents with structured outputs —
response_format- reasoning blocks ship in the same envelope
- Cross-model parity — you want a single client shape that reaches the o-series, GPT-5, and the Claude family
/chat/completions. For Anthropic-native
tool_use + cache_control, use /messages.
Request
Headers
| Header | Required | Notes |
|---|---|---|
Authorization | yes | Bearer sk-byts-… — the only auth shape Responses accepts. |
content-type | yes | application/json. |
Body (selected fields)
The Responses API surface is large; ByteSpike forwards the full schema to whichever model you request. Common fields:| Field | Type | Required | Notes |
|---|---|---|---|
model | string | yes | Catalog slug (gpt-5-5, claude-opus-4-8, gpt-5-4-mini, …). |
input | string | array | yes | The conversation. String = single user message; array = multi-turn (role/content shapes match OpenAI’s Responses spec). |
instructions | string | no | The “system prompt” position. |
tools | array | no | Tool definitions in Responses shape ({type, name, parameters, …}). |
tool_choice | string | object | no | auto / none / required / {type: "function", function: {name}}. |
temperature | number | no | 1.0 default. |
top_p | number | no | Nucleus sampling. |
max_output_tokens | integer | no | Cap on output token count. |
reasoning | object | no | {"effort": "low" | "medium" | "high"} — for o-series and GPT-5 reasoning models. |
response_format | object | no | {"type": "json_object"} or {"type": "json_schema", "json_schema": {...}} for strict structured output. |
stream | boolean | no | SSE delivery. Same event protocol as the OpenAI Responses spec. |
metadata | object | no | {"user_id": "..."} — forwarded to the model and logged. |
previous_response_id | string | no | Multi-turn chaining; references a prior response.id. |
Response
Response fields
| Field | Type | Notes |
|---|---|---|
id | string | Server-generated id, prefix resp_. Use in previous_response_id for chaining. |
status | string | completed, in_progress, failed. Always completed on a non-streaming success. |
output | array | Array of output items. Each item has a type — message, reasoning, tool_call, etc. |
usage.input_tokens | integer | Tokens billed for input. |
usage.output_tokens | integer | Tokens billed for output (includes reasoning tokens for o-series). |
usage.reasoning_tokens | integer | Subset of output_tokens spent on hidden reasoning, when applicable. |
Accounting headers
Same quota envelope as every other endpoint:Streaming
Pass"stream": true:
response.created, response.output_text.delta,
response.completed, etc.
Cross-model routing
The same Responses request shape works against every model in our catalog. Pick by themodel field:
| Model family | Example slugs | Notes |
|---|---|---|
| GPT-5 / o-series | gpt-5-5, gpt-5-4, gpt-5-4-mini | Native Responses shape — no translation. |
| Claude | claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5 | reasoning maps to Claude’s thinking blocks. |
| Gemini | gemini-3-1-pro, gemini-2-5-flash | Full Responses surface supported. |
tool_use / tool_calls, structured
outputs (response_format), and reasoning tokens. The output shape
you receive is always Responses-API regardless of the model.
Errors
OpenAI envelope (matches the OpenAI Responses spec):| HTTP | code | When |
|---|---|---|
| 400 | invalid_request_error | Bad JSON, missing model, malformed tools. |
| 401 | authentication_error | Missing / revoked key. |
| 402 | insufficient_balance | Account is out of credits. Top up at console.bytespike.ai. |
| 413 | invalid_request_error (body too large) | Prompt + tool schema exceeds the body cap. |
| 429 | rate_limit_exceeded | See the rate-limit chapter. |
| 502 / 503 | api_error | Transient timeout or no capacity in the routing group for this model. The dial-test in Console → Models helps debug. |
Pricing
Per-1k tokens. Same rate as the model’s native endpoint — translation overhead is not billed. Live card: bytespike.ai/pricing.Related
/chat/completions— OpenAI Chat Completions shape/messages— Anthropic Messages shape- Configuring Codex with ByteSpike — concrete CLI setup