Skip to main content
ByteSpike exposes a small, well-defined set of endpoints behind one API key. Each one matches a different client persona — an existing SDK, an agent framework, a multimodal pipeline. The gateway translates between protocols so you don’t write per-vendor glue. Principle: pick the protocol your client already speaks; the gateway handles the rest.

The endpoint surface

#EndpointPathPersona
1Anthropic MessagesPOST /v1/messagesAgent frameworks · Claude SDK · DOSIA · Claude Code
2OpenAI ResponsesPOST /v1/responsesOpenAI Codex CLI · OpenAI Agents SDK · reasoning-heavy clients
3OpenAI Chat CompletionsPOST /v1/chat/completionsThe ubiquitous chat shape · openai SDK · most editor integrations
4Gemini NativePOST /v1beta/models/{model}:generateContentGoogle’s CLI / SDK · Vertex-compat clients
5Image Generation (sync)POST /v1/images/generations · /v1/images/editsOpenAI-shape image SDKs · single-call image generation
6Async multimodal tasksPOST /v1/tasks/submit · /v1/tasks/query · /v1/tasks/cancelVideo generation (Sora, Veo, Seedance) · batched image rendering
That’s it. No /v1/embeddings, no /v1/audio/*, no /v1/rerank, no /v1/assistants — ByteSpike does not currently expose those surfaces.

1. Anthropic Messages — /v1/messages

The canonical agent protocol on ByteSpike. Preserves tool_use, cache_control, thinking, and web_search blocks end-to-end. Use when:
  • You’re building an agent that needs explicit tool-call / tool-result blocks
  • You want extended thinking on Opus / Sonnet 4.x
  • Your client is the official anthropic SDK (Python / Node) or a derivative
Works with Claude models natively, and with DeepSeek, Gemini, and GPT models via transparent translation — pass any catalog model in the model field. See /v1/messages reference.

2. OpenAI Responses — /v1/responses

The newer OpenAI agent-style protocol (Codex CLI + Agents SDK). Use when:
  • Your client is OpenAI Codex CLI
  • You want structured-outputs + reasoning combined
  • You’re on the OpenAI Agents SDK
Works with GPT models natively, and with Claude and Gemini via transparent translation. See /v1/responses reference.

3. OpenAI Chat Completions — /v1/chat/completions

The most ubiquitous chat shape in the ecosystem. Almost every SDK speaks it. Use when:
  • You have existing openai SDK code and want the lowest-friction route
  • You’re not writing an agent — chat in, chat out
  • You’re calling a non-OpenAI model but want to keep the OpenAI SDK
Works with GPT, Gemini, DeepSeek, Kimi (Moonshot), GLM (Zhipu), MiniMax, and Doubao (ByteDance) models — pass any catalog model in the model field. See /v1/chat/completions reference.

4. Gemini Native — /v1beta/models/{model}:generateContent

Google’s native protocol verbatim. Use when:
  • You’re using Google’s official CLI or SDK
  • You want grounding (googleSearch tool) in its native form
  • You need Gemini’s exact response shape for downstream tooling
Streaming via :streamGenerateContent?alt=sse. This protocol serves the Gemini family only. See /v1beta/... reference.

5. Image generation (sync) — /v1/images/generations + /v1/images/edits

OpenAI-shape body across providers; the model field routes. Use for:
  • Single-prompt image generation (GPT-Image-2 shape)
  • Image edits with a mask or instruction
Serves the GPT-Image-2 and GPT-4o-image models, the Seedream (ByteDance) family (V4 / V4.5 / V5lite), and the Nano-Banana (Google) family — all through the same OpenAI-shape body.

6. Async multimodal tasks — /v1/tasks/{submit, query, cancel, stream}

Video and long-running image generation use the async task model: submit returns a task_id immediately, poll /tasks/query or stream state changes over SSE on /tasks/stream/{task_id}. Use for:
  • Video generation (Sora 2, Veo 3.1, Seedance) — all of which take 10-60s+
  • Batched image generation where you want fire-and-forget
  • High-concurrency producers that shouldn’t hold HTTP connections open
See /tasks/submit reference.

Which protocol for which model family

Which protocol you can call each model family through on ByteSpike. ✅ = supported; — = not available on that protocol.
Model familyAnthropic MessagesOpenAI ResponsesOpenAI ChatGemini NativeImage (sync)Async tasks
Claude (Anthropic)translatedtranslated
GPT (OpenAI)translated✅ (Sora)
Gemini (Google)translatedtranslated✅ (shim)✅ (Veo)
DeepSeek✅ (V3-anthropic, V4)
Kimi (Moonshot)
GLM (Zhipu)
MiniMax
Doubao (ByteDance)✅ (Seedream)✅ (Seedance)
For the live, account-specific list, hit GET /api/v1/me/available-models or click Test next to any model in the console.

Picking by client

Your clientUse this endpoint
Claude CodeAnthropic Messages
Claude DesktopAnthropic Messages
Anthropic SDK (Python / Node)Anthropic Messages
Cursor / Continue / Cline / ZedOpenAI Chat Completions
OpenAI Python / Node SDKOpenAI Chat Completions
OpenAI Codex CLIOpenAI Responses
OpenAI Agents SDKOpenAI Responses
Google Gemini CLI / SDKGemini Native (or OpenAI shim)
AiderOpenAI Chat Completions (or via SDK base URL override)

Scenarios

You wantEndpoint
Agent with tool_use + thinkingAnthropic Messages
Existing openai SDK code with no rewriteChat Completions
Codex CLI / OpenAI Agents SDKOpenAI Responses
Switch between Claude / GPT / Gemini without rewriting SDK codeChat Completions for most cases; Anthropic Messages if you need cache_control or thinking
Generate an imageImage (sync)
Generate a videoAsync tasks

One key, multi-endpoint

A ByteSpike API key today binds to one routing group (group_id). The group determines which model families the key can reach. Within a group, every endpoint shape those models support is available — e.g. a key in claude-default can call /v1/messages, /v1/chat/completions (translated), and /v1/responses (translated) against Claude models. To reach model families in different groups, create multiple keys (one per group) and select per request. See Authentication.

Next