Skip to main content
The OpenAI-native protocol. Speaks Chat Completions verbatim, so any client built for the Chat Completions API (the OpenAI Python / Node SDKs, LangChain default, LlamaIndex, etc.) works unchanged — just point the base URL at https://llm.bytespike.ai/v1 and swap the key. Non-GPT models (Claude, Gemini, DeepSeek, Doubao) are translated under the hood.

When to use

Pick this endpoint when you want:
  • Drop-in replacement for openai SDK calls (OpenAI(base_url=…, api_key=…))
  • OpenAI-only features like response_format: {"type": "json_schema"} or logprobs
  • Frameworks that hard-code the OpenAI shape
For tool calling, the OpenAI tool_calls shape works as-is. For prompt caching on Claude, prefer /v1/messages — the OpenAI shape can’t carry cache_control.

Request

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-5-4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Summarize the first chapter of Moby Dick."}
    ],
    "max_tokens": 1024
  }'

Headers

HeaderRequiredNotes
AuthorizationyesBearer sk-byts-…. (x-api-key is also accepted for symmetry with /messages.)
content-typeyesapplication/json.

Body

FieldTypeRequiredNotes
modelstringyesModel slug. Any catalog model — see Cross-model routing.
messagesarrayyesConversation history. Roles: system, user, assistant, tool.
max_tokensintegernoHard cap on response length. Default uppers vary per model.
temperaturenumbernoDefault 1.0.
top_pnumbernoNucleus sampling.
nintegernoGenerations to return. Default 1.
streambooleannoServer-sent events. See Streaming.
stopstring | string[]noCustom stop tokens.
presence_penaltynumberno-2.0 to 2.0.
frequency_penaltynumberno-2.0 to 2.0.
response_formatobjectno{"type": "json_object"} / {"type": "json_schema", "json_schema": {...}}.
toolsarraynoFunction / tool definitions (see Tool calling).
tool_choicestring | objectno"none" / "auto" / {"type": "function", "function": {"name": "…"}}.
seedintegernoDeterministic sampling (best-effort).
userstringnoStable end-user id — forwarded to the model + logged for abuse tracing.

Response

{
  "id": "chatcmpl-9aBc…",
  "object": "chat.completion",
  "created": 1716393600,
  "model": "gpt-5-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Ishmael, the narrator, signs onto a whaling ship..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 23,
    "completion_tokens": 87,
    "total_tokens": 110
  }
}

Response fields

FieldTypeNotes
idstringServer-generated id, prefix chatcmpl-.
choices[].message.rolestringAlways "assistant" on a non-error response.
choices[].message.contentstring | nullnull when the model emits tool calls instead of text.
choices[].message.tool_callsarrayPresent when the model wants to invoke a tool.
choices[].finish_reasonstringstop, length, tool_calls, content_filter.
usage.prompt_tokensintegerTokens billed for input.
usage.completion_tokensintegerTokens billed for output.
usage.total_tokensintegerSum.

Accounting headers

Same envelope as every other endpoint:
X-RateLimit-Limit: 50.00
X-RateLimit-Remaining: 42.18
X-RateLimit-Reset: 1716705600
X-Quota-Remaining-Credits: 192.40
Full breakdown in the API Reference overview.

Streaming

Set "stream": true. The response is SSE with data: {json} lines and a terminating data: [DONE]:
data: {"id":"chatcmpl-…","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-…","choices":[{"index":0,"delta":{"content":"Ishmael"},"finish_reason":null}]}



data: {"id":"chatcmpl-…","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]
Token usage is shipped on the final non-[DONE] frame when stream_options: {"include_usage": true} is set.

Tool calling

Tools follow OpenAI’s function shape. Round-trip via two requests:

Round 1 — tool offered

{
  "model": "gpt-5-4",
  "messages": [
    {"role": "user", "content": "What's the weather in Tokyo?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city.",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {"type": "string"}
          },
          "required": ["city"]
        }
      }
    }
  ]
}
Response:
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\":\"Tokyo\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}
Note arguments is a JSON string, not an object — that’s the OpenAI convention.

Round 2 — tool result returned

{
  "model": "gpt-5-4",
  "messages": [
    {"role": "user", "content": "What's the weather in Tokyo?"},
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "id": "call_abc",
          "type": "function",
          "function": {"name": "get_weather", "arguments": "{\"city\":\"Tokyo\"}"}
        }
      ]
    },
    {
      "role": "tool",
      "tool_call_id": "call_abc",
      "content": "18°C, partly cloudy"
    }
  ],
  "tools": [ /* same schema */ ]
}

Vision (image_url content)

Send images via image_url content blocks. Both data URLs and HTTPS URLs work; the gateway forwards bytes to vision-capable models (GPT-5-x, Claude Sonnet/Opus 4.x, Gemini, Doubao Vision).
{
  "model": "gpt-5-4",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/jpeg;base64,/9j/4AAQSkZJRg..."
          }
        }
      ]
    }
  ]
}
The detail field ("low" / "high" / "auto") is honored on models that support it.

Cross-model routing

This endpoint accepts any ByteSpike catalog model in the model field — the gateway translates the OpenAI shape to each model’s native protocol transparently:
{"model": "gpt-5-4", "messages": [...]}
{"model": "claude-sonnet-4-6", "messages": [...]}
{"model": "gemini-3-1-pro", "messages": [...]}
{"model": "deepseek-v4-pro", "messages": [...]}
{"model": "kimi-k2-6", "messages": [...]}
Caveats:
  • response_format: {"type": "json_schema"} requires a model that supports structured outputs natively (GPT 4.x+, some Gemini Pro variants). Other models return a 400 unsupported_feature.
  • seed is best-effort; cross-model determinism isn’t guaranteed.
  • tool_calls.arguments always returns a JSON string regardless of model — even on Claude models that internally return structured input.
Full catalog: GET /v1/models. Pricing per model: bytespike.ai/pricing.

Rate limiting & quota headers

HeaderNotes
x-ratelimit-limit-requestsRequests/min cap for your tier.
x-ratelimit-remaining-requestsRemaining in the current window.
x-ratelimit-reset-requestsSeconds until the bucket refills.
x-ratelimit-limit-tokensTokens/min cap.
x-ratelimit-remaining-tokensTokens remaining in this window.

Errors

All non-2xx responses are free — failures don’t bill. Body shape matches OpenAI’s error envelope:
{
  "error": {
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "message": "You exceeded your current requests-per-minute budget.",
    "param": null
  }
}
Statuserror.typeTrigger
400invalid_request_errorBody validation failed. Message identifies the field.
400invalid_request_error (code unsupported_model)Model slug not in scope or retired.
400invalid_request_error (code unsupported_feature)E.g. response_format: json_schema on a non-structured-output model.
401authentication_errorMissing / revoked key.
402insufficient_creditsWallet exhausted — top up at console.bytespike.ai/billing.
403permission_errorScope denied, IP not allowlisted, or model gated.
404not_found_errorPath typo or unknown model id.
429rate_limit_errorTier rate-limit. Backoff per x-ratelimit-reset-*.
5xxapi_error / overloaded_errorUpstream provider issue. Free + automatic retry envelope.

SDK example

OpenAI Python SDK, base URL flipped:
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["BYTESPIKE_API_KEY"],
    base_url="https://llm.bytespike.ai/v1",
)

r = client.chat.completions.create(
    model="claude-sonnet-4-6",   # Claude through the OpenAI shape
    messages=[{"role": "user", "content": "Hello!"}],
)
print(r.choices[0].message.content)
No SDK fork needed — the upstream openai package is the one we test against.