DOSIA Agent mode - ByteSpike

DOSIA Agent mode runs on the Anthropic Messages protocol — POST /v1/messages. That’s not negotiable: agent frameworks worth using need tool_use blocks, cache_control blocks, and thinking blocks to pass through end-to-end, and the Anthropic shape is the only protocol that exposes all three as first-class concepts. DOSIA Chat mode is a different story — it runs on OpenAI Chat Completions and works across every chat-shape model on ByteSpike. This page is about Agent mode specifically: what works today, what’s planned, what’s worth knowing per model.

The protocol surface

A DOSIA Agent request lands on https://llm.bytespike.ai/v1/messages with the standard Anthropic shape:

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 2048,
  "tools": [
    {
      "name": "search_files",
      "description": "Search the local workspace for files.",
      "input_schema": { ... }
    }
  ],
  "messages": [
    { "role": "user", "content": "Find every place we set the locale cookie." }
  ]
}

tool_use content blocks come back in the response; DOSIA executes the tool; the next turn sends a tool_result block back. Standard Anthropic Messages agent loop. The cache_control: { type: "ephemeral" } markers DOSIA puts on its system prompt and on stable context (workspace tree, recent edits) flow through to whichever model serves the request — see the cache_control note per model below.

Which models Agent mode can target

Agent mode benefits from the same routing as every other request — but the protocol constrains the eligible set to models that support an Anthropic Messages surface. Today’s eligible set:

Model family	Status	Notes
`claude-haiku-4-5` / `sonnet-4-5` / `sonnet-4-6` / `opus-4-7` / `opus-4-8`	✅ live	Native Anthropic shape; everything works
`deepseek-v4-pro` / `deepseek-v4-flash`	✅ live	See DeepSeek caveats below
`kimi-k2-6` (anthropic-compat alias)	⏳ planned	Anthropic-compat surface in flight
GLM (anthropic-compat alias)	⏳ planned	Same as above
MiniMax (anthropic-compat alias)	⏳ planned	Same

GPT and Gemini do not appear here — they don’t support an Anthropic Messages surface, and ByteSpike does not synthesize one (the protocol-mapping cost in fidelity is not worth it for agent use). For GPT or Gemini, DOSIA Chat mode is the route — see endpoint types.

Picking a model for Agent

A short opinionated decision aid for DOSIA Agent users:

Scenario	Pick	Why
Default agent work, broad capability	`claude-sonnet-4-6`	Tool use + thinking + cache_control + web_search all together
Codebase-scale agent (full repo into context)	`claude-opus-4-8`	200K context window, current Anthropic flagship
Cost-optimized agent at production scale	`claude-haiku-4-5`	Tool use included; thinking is not, but most agent loops don’t need extended thinking
Chinese-language workloads, cost-sensitive	`deepseek-v4-pro`	~10× cheaper than Sonnet; reasoning chain available
Chinese-language at the cheapest tier	`deepseek-v4-flash`	Haiku-class price; subset of Pro’s capabilities

The recommended-paths table in models/index covers the same ground from the model-first angle.

`cache_control` per model

cache_control: { type: "ephemeral" } markers behave differently per model:

Claude models — full first-class support. Cache write is 1.25× input; cache read is ~10% of input. TTL refreshes on each hit.
DeepSeek models — cache_control is not yet supported. The marker passes through unchanged and is ignored. No caching benefit, but no error either.
Kimi / GLM / MiniMax — same as DeepSeek today. The anthropic-compat aliases accept the shape but caching is not yet wired through.

This is a known limitation. The current recommendation: keep cache_control markers in your DOSIA Agent system prompt regardless of which model you’re targeting — they’re free when ignored and they activate automatically once a model gains support.

DeepSeek caveats

DOSIA Agent against deepseek-v4-pro / deepseek-v4-flash is fully supported today, with three caveats:

No cache_control. As above.
Vision is not available on the DeepSeek API. DeepSeek’s models don’t accept image input over the API (either the OpenAI or anthropic-compat shape). If your Agent expects to send image content blocks, route those requests to a Claude or gpt-5-4 model; keep the rest on DeepSeek.
thinking blocks appear as reasoning_content on the OpenAI endpoint but as proper thinking blocks on the anthropic-compat endpoint. DOSIA Agent uses the anthropic-compat path so you get the native shape; the difference matters only if you switch protocols.

Failure modes

What can go wrong on a DOSIA Agent → ByteSpike call, and what each failure looks like:

Symptom	Likely cause	Where to look
404 on `/v1/messages`	Model name not eligible for Agent (e.g. you sent a GPT model)	Send a Claude / DeepSeek / future-supported model. See the eligibility table above.
422 with “tool_use not supported”	Model doesn’t expose an anthropic-compat surface yet	Switch the request to Claude or DeepSeek; check the coverage matrix
5xx	The model is temporarily unavailable	ByteSpike auto-retries within your key’s group. If everything in the group is unavailable, the gateway surfaces the error.
Mid-stream `error` event	The response was aborted mid-stream	Zero credits charged (see credits and billing); DOSIA will surface to the user as a streaming-failure toast

For the broader failure-billing policy, see credits and billing. For retries and idempotency, see error handling.

Configuring DOSIA Cloud Enterprise

For DOSIA Cloud Enterprise admins building permission templates: the Global edition and China edition presets pre-select the right Agent default per region. See the models index DOSIA recommended paths section for the table mapping each preset to its Agent + Chat defaults.

Endpoint types — the full protocol map
Models index — per-model docs including the DOSIA recommended-paths table

​The protocol surface

​Which models Agent mode can target

​Picking a model for Agent

​cache_control per model

​DeepSeek caveats

​Failure modes

​Configuring DOSIA Cloud Enterprise

​Next