POST /responses - ByteSpike

The Responses API shape (introduced by OpenAI for o1 / o3 / GPT-5 + Agents SDK) — billed and served by ByteSpike like any other endpoint. Under the hood the gateway translates Responses-shaped requests into each model’s native protocol, so you can point your OpenAI SDK + Agents SDK + Codex CLI at ByteSpike and have it work against Claude, Gemini, DeepSeek, etc.

When to use

OpenAI Agents SDK — the SDK speaks Responses-shape by default
Codex CLI — uses Responses internally; ByteSpike is a drop-in via --responses-base-url https://llm.bytespike.ai/v1
Multi-turn agents with structured outputs — response_format
- reasoning blocks ship in the same envelope
Cross-model parity — you want a single client shape that reaches the o-series, GPT-5, and the Claude family

For straight chat without reasoning steps, prefer /chat/completions. For Anthropic-native tool_use + cache_control, use /messages.

Request

curl https://llm.bytespike.ai/v1/responses \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-5-5",
    "input": "What is the capital of France?"
  }'

Headers

Header	Required	Notes
`Authorization`	yes	`Bearer sk-byts-…` — the only auth shape Responses accepts.
`content-type`	yes	`application/json`.

Body (selected fields)

The Responses API surface is large; ByteSpike forwards the full schema to whichever model you request. Common fields:

Field	Type	Required	Notes
`model`	string	yes	Catalog slug (`gpt-5-5`, `claude-opus-4-8`, `gpt-5-4-mini`, …).
`input`	string \| array	yes	The conversation. String = single user message; array = multi-turn (role/content shapes match OpenAI’s Responses spec).
`instructions`	string	no	The “system prompt” position.
`tools`	array	no	Tool definitions in Responses shape (`{type, name, parameters, …}`).
`tool_choice`	string \| object	no	`auto` / `none` / `required` / `{type: "function", function: {name}}`.
`temperature`	number	no	`1.0` default.
`top_p`	number	no	Nucleus sampling.
`max_output_tokens`	integer	no	Cap on output token count.
`reasoning`	object	no	`{"effort": "low" \| "medium" \| "high"}` — for o-series and GPT-5 reasoning models.
`response_format`	object	no	`{"type": "json_object"}` or `{"type": "json_schema", "json_schema": {...}}` for strict structured output.
`stream`	boolean	no	SSE delivery. Same event protocol as the OpenAI Responses spec.
`metadata`	object	no	`{"user_id": "..."}` — forwarded to the model and logged.
`previous_response_id`	string	no	Multi-turn chaining; references a prior `response.id`.

Response

{
  "id": "resp_01HQX9F2P6Y8VEX3CRZ8GXJVD9",
  "object": "response",
  "created_at": 1716700000,
  "model": "gpt-5-5",
  "status": "completed",
  "output": [
    {
      "type": "message",
      "id": "msg_01ABC",
      "role": "assistant",
      "content": [
        {"type": "output_text", "text": "Paris."}
      ]
    }
  ],
  "usage": {
    "input_tokens": 9,
    "output_tokens": 2,
    "total_tokens": 11
  }
}

Response fields

Field	Type	Notes
`id`	string	Server-generated id, prefix `resp_`. Use in `previous_response_id` for chaining.
`status`	string	`completed`, `in_progress`, `failed`. Always `completed` on a non-streaming success.
`output`	array	Array of output items. Each item has a `type` — `message`, `reasoning`, `tool_call`, etc.
`usage.input_tokens`	integer	Tokens billed for input.
`usage.output_tokens`	integer	Tokens billed for output (includes reasoning tokens for o-series).
`usage.reasoning_tokens`	integer	Subset of `output_tokens` spent on hidden reasoning, when applicable.

Accounting headers

Same quota envelope as every other endpoint:

X-RateLimit-Limit: 50.00
X-RateLimit-Remaining: 42.18
X-RateLimit-Reset: 1716705600
X-Quota-Remaining-Credits: 192.40

Full breakdown in the API Reference overview.

Streaming

Pass "stream": true:

curl -N https://llm.bytespike.ai/v1/responses \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-5-5",
    "input": "Tell a 3-sentence story.",
    "stream": true
  }'

The SSE stream is byte-for-byte compatible with the OpenAI Responses streaming protocol — response.created, response.output_text.delta, response.completed, etc.

Cross-model routing

The same Responses request shape works against every model in our catalog. Pick by the model field:

Model family	Example slugs	Notes
GPT-5 / o-series	`gpt-5-5`, `gpt-5-4`, `gpt-5-4-mini`	Native Responses shape — no translation.
Claude	`claude-opus-4-8`, `claude-sonnet-4-6`, `claude-haiku-4-5`	`reasoning` maps to Claude’s `thinking` blocks.
Gemini	`gemini-3-1-pro`, `gemini-2-5-flash`	Full Responses surface supported.

Translated paths preserve tool_use / tool_calls, structured outputs (response_format), and reasoning tokens. The output shape you receive is always Responses-API regardless of the model.

Errors

OpenAI envelope (matches the OpenAI Responses spec):

{
  "error": {
    "message": "Model \"foo-3\" not found",
    "type": "invalid_request_error",
    "code": "model_not_found"
  }
}

HTTP	`code`	When
400	`invalid_request_error`	Bad JSON, missing `model`, malformed `tools`.
401	`authentication_error`	Missing / revoked key.
402	`insufficient_balance`	Account is out of credits. Top up at console.bytespike.ai.
413	`invalid_request_error` (body too large)	Prompt + tool schema exceeds the body cap.
429	`rate_limit_exceeded`	See the rate-limit chapter.
502 / 503	`api_error`	Transient timeout or no capacity in the routing group for this `model`. The dial-test in Console → Models helps debug.

Pricing

Per-1k tokens. Same rate as the model’s native endpoint — translation overhead is not billed. Live card: bytespike.ai/pricing.

/chat/completions — OpenAI Chat Completions shape
/messages — Anthropic Messages shape
Configuring Codex with ByteSpike — concrete CLI setup

​When to use

​Request

​Headers

​Body (selected fields)

​Response

​Response fields

​Accounting headers

​Streaming

​Cross-model routing

​Errors

​Pricing

​Related