https://llm.bytespike.ai/v1 and swap the key.
Non-GPT models (Claude, Gemini, DeepSeek, Doubao) are translated under
the hood.
When to use
Pick this endpoint when you want:- Drop-in replacement for
openaiSDK calls (OpenAI(base_url=…, api_key=…)) - OpenAI-only features like
response_format: {"type": "json_schema"}orlogprobs - Frameworks that hard-code the OpenAI shape
tool_calls shape works as-is. For prompt caching on Claude, prefer /v1/messages — the OpenAI shape can’t carry cache_control.
Request
Headers
| Header | Required | Notes |
|---|---|---|
Authorization | yes | Bearer sk-byts-…. (x-api-key is also accepted for symmetry with /messages.) |
content-type | yes | application/json. |
Body
| Field | Type | Required | Notes |
|---|---|---|---|
model | string | yes | Model slug. Any catalog model — see Cross-model routing. |
messages | array | yes | Conversation history. Roles: system, user, assistant, tool. |
max_tokens | integer | no | Hard cap on response length. Default uppers vary per model. |
temperature | number | no | Default 1.0. |
top_p | number | no | Nucleus sampling. |
n | integer | no | Generations to return. Default 1. |
stream | boolean | no | Server-sent events. See Streaming. |
stop | string | string[] | no | Custom stop tokens. |
presence_penalty | number | no | -2.0 to 2.0. |
frequency_penalty | number | no | -2.0 to 2.0. |
response_format | object | no | {"type": "json_object"} / {"type": "json_schema", "json_schema": {...}}. |
tools | array | no | Function / tool definitions (see Tool calling). |
tool_choice | string | object | no | "none" / "auto" / {"type": "function", "function": {"name": "…"}}. |
seed | integer | no | Deterministic sampling (best-effort). |
user | string | no | Stable end-user id — forwarded to the model + logged for abuse tracing. |
Response
Response fields
| Field | Type | Notes |
|---|---|---|
id | string | Server-generated id, prefix chatcmpl-. |
choices[].message.role | string | Always "assistant" on a non-error response. |
choices[].message.content | string | null | null when the model emits tool calls instead of text. |
choices[].message.tool_calls | array | Present when the model wants to invoke a tool. |
choices[].finish_reason | string | stop, length, tool_calls, content_filter. |
usage.prompt_tokens | integer | Tokens billed for input. |
usage.completion_tokens | integer | Tokens billed for output. |
usage.total_tokens | integer | Sum. |
Accounting headers
Same envelope as every other endpoint:Streaming
Set"stream": true. The response is SSE with data: {json} lines and a terminating data: [DONE]:
[DONE] frame when stream_options: {"include_usage": true} is set.
Tool calling
Tools follow OpenAI’sfunction shape. Round-trip via two requests:
Round 1 — tool offered
arguments is a JSON string, not an object — that’s the OpenAI convention.
Round 2 — tool result returned
Vision (image_url content)
Send images viaimage_url content blocks. Both data URLs and HTTPS URLs work; the gateway forwards bytes to vision-capable models (GPT-5-x, Claude Sonnet/Opus 4.x, Gemini, Doubao Vision).
detail field ("low" / "high" / "auto") is honored on models that support it.
Cross-model routing
This endpoint accepts any ByteSpike catalog model in themodel field — the gateway translates the OpenAI shape to each model’s native protocol transparently:
response_format: {"type": "json_schema"}requires a model that supports structured outputs natively (GPT 4.x+, some Gemini Pro variants). Other models return a 400unsupported_feature.seedis best-effort; cross-model determinism isn’t guaranteed.tool_calls.argumentsalways returns a JSON string regardless of model — even on Claude models that internally return structured input.
GET /v1/models. Pricing per model: bytespike.ai/pricing.
Rate limiting & quota headers
| Header | Notes |
|---|---|
x-ratelimit-limit-requests | Requests/min cap for your tier. |
x-ratelimit-remaining-requests | Remaining in the current window. |
x-ratelimit-reset-requests | Seconds until the bucket refills. |
x-ratelimit-limit-tokens | Tokens/min cap. |
x-ratelimit-remaining-tokens | Tokens remaining in this window. |
Errors
All non-2xx responses are free — failures don’t bill. Body shape matches OpenAI’s error envelope:| Status | error.type | Trigger |
|---|---|---|
| 400 | invalid_request_error | Body validation failed. Message identifies the field. |
| 400 | invalid_request_error (code unsupported_model) | Model slug not in scope or retired. |
| 400 | invalid_request_error (code unsupported_feature) | E.g. response_format: json_schema on a non-structured-output model. |
| 401 | authentication_error | Missing / revoked key. |
| 402 | insufficient_credits | Wallet exhausted — top up at console.bytespike.ai/billing. |
| 403 | permission_error | Scope denied, IP not allowlisted, or model gated. |
| 404 | not_found_error | Path typo or unknown model id. |
| 429 | rate_limit_error | Tier rate-limit. Backoff per x-ratelimit-reset-*. |
| 5xx | api_error / overloaded_error | Upstream provider issue. Free + automatic retry envelope. |
SDK example
OpenAI Python SDK, base URL flipped:openai package is the one we test against.