tool_use, cache_control, and thinking blocks.
Cross-vendor models (GPT, Gemini, DeepSeek, Doubao, etc.) are
transparently translated under the hood — the request you send is
Anthropic-shape, regardless of which model value you pick.
When to use
Pick this endpoint when you want:- Anthropic SDK / Claude Code / Claude Desktop to talk to any model in our catalog
- Tool use with the cleanest schema (no JSON-string wrapping like OpenAI’s
tool_calls) - Prompt caching (
cache_controlblocks) on long static system prompts - Extended thinking on Opus / Sonnet 4.x
/chat/completions. For Google Native, use /v1beta/models/{model}:generateContent.
Request
Headers
| Header | Required | Notes |
|---|---|---|
x-api-key | yes | Your ByteSpike key (sk-byts-…). |
anthropic-version | yes | Pin 2023-06-01. |
content-type | yes | application/json. |
anthropic-beta | no | Forwarded to the model for Anthropic beta features. |
Body
| Field | Type | Required | Notes |
|---|---|---|---|
model | string | yes | Model slug. Any catalog model works — see Cross-model routing below. |
messages | array | yes | Conversation history (Anthropic shape). |
max_tokens | integer | yes | Hard cap on response length. |
system | string | array | no | System prompt (string for simple cases, array for cache_control). |
tools | array | no | Tool definitions (Anthropic input_schema format). |
tool_choice | object | no | {"type": "auto"} / {"type": "any"} / {"type": "tool", "name": "…"}. |
temperature | number | no | Default 1.0. |
top_p | number | no | Nucleus sampling. |
stop_sequences | string[] | no | Custom stop tokens. |
stream | boolean | no | Server-sent events. See Streaming. |
metadata | object | no | {"user_id": "..."} — forwarded to the model and logged on our side. |
Response
Response fields
| Field | Type | Notes |
|---|---|---|
id | string | Server-generated id, prefix msg_. |
type | string | Always "message" on a non-error response. |
role | string | Always "assistant". |
content | array | Content blocks: text, tool_use, thinking (Opus / Sonnet 4.x). |
stop_reason | string | end_turn, max_tokens, stop_sequence, tool_use. |
usage.input_tokens | integer | Tokens billed for input. |
usage.output_tokens | integer | Tokens billed for output. |
usage.cache_read_input_tokens | integer | Tokens served from cache (discounted rate). |
usage.cache_creation_input_tokens | integer | Tokens written to cache (full rate). |
Accounting headers
Every response — success or failure, streamed or not — carries the gateway’s quota envelope:X-RateLimit-Limit/Remaining— USD budget for the rate-limit bucket closest to constraining you (5h / 1d / 7d, whichever is tightest).X-RateLimit-Reset— Unix timestamp for when that bucket resets.X-Quota-Remaining-Credits— lifetime credits remaining on this key (USD; 1 USD = 1,000,000 credits). Failed requests don’t move this number.X-Org-Quota-Remaining-Credits— org wallet remaining, on org-owned keys.
GET /api/v1/usage — it returns the prompt + completion tokens and the final billed credits per call.
Streaming
Set"stream": true. The response is SSE in the standard Anthropic format:
message_start → one or more (content_block_start → content_block_delta× → content_block_stop) → message_delta → message_stop — is identical to the Anthropic Messages spec. Tool calls arrive as tool_use content blocks streamed in chunks via input_json_delta.
Tool use (multi-turn)
Tool calling round-trips through two requests. The first carries the tool schema; the model responds with atool_use block; you execute the tool and POST the result back in a second request.
Round 1 — tool offered
Round 2 — tool result returned
Executeget_weather({city: "Tokyo"}) locally, then send the result as a tool_result block referencing the original tool_use.id:
stop_reason: "end_turn".
Image / multimodal content
Send images asimage content blocks. Both base64 and URL sources work; the gateway forwards the bytes directly to vision-capable models (Claude Sonnet/Opus 4.x, GPT-5-x, Gemini, Doubao Vision, etc.).
GET /v1/models.
Cross-model routing
This endpoint accepts any ByteSpike catalog model in themodel field — the gateway translates the request to each model’s native protocol transparently. Pick whatever fits your latency / cost / capability mix:
- Model-specific features that don’t translate (e.g. OpenAI
response_format: {"type": "json_schema"}) need the matching protocol endpoint. stop_reasonandusagefields are normalised back to Anthropic shape regardless of the model.
GET /v1/models. Pricing per model: bytespike.ai/pricing.
Cache control
cache_control blocks work identically to the Anthropic Messages spec. Costs are billed at the discounted cache-read rate when a hit occurs; the rate is visible in the pricing table under “cache read”.
usage.cache_read_input_tokens and usage.cache_creation_input_tokens fields in the response report hits and writes respectively.
Rate limiting & quota headers
| Header | Notes |
|---|---|
x-ratelimit-limit-requests | Requests/min cap for your tier. |
x-ratelimit-remaining-requests | Remaining in the current window. |
x-ratelimit-reset-requests | Seconds until the bucket refills. |
x-ratelimit-limit-tokens | Tokens/min cap. |
x-ratelimit-remaining-tokens | Tokens remaining in this window. |
429, inspect the x-ratelimit-reset-* header to know when to retry.
Errors
All non-2xx responses are free — failures don’t bill.| Status | error.type | Trigger |
|---|---|---|
| 400 | invalid_request_error | Body validation failed (per Anthropic schema). The message tells you which field. |
| 400 | unsupported_model | model slug not in your scope or retired. |
| 400 | unsupported_feature | E.g. image block sent to a text-only model, or tools to a model that can’t tool-use. |
| 401 | authentication_error | Missing / revoked key. |
| 402 | insufficient_credits | Wallet exhausted. Top up at console.bytespike.ai/billing. |
| 403 | permission_error | Scope denied, IP not allowlisted, or model gated. |
| 404 | not_found_error | Path typo (/v1/messages vs /messages) or unknown model id. |
| 429 | rate_limit_error | Tier rate-limit. Backoff per x-ratelimit-reset-*. |
| 5xx | api_error / overloaded_error | Upstream provider issue. Free + automatic retry envelope. |