Kimi K2.6 - ByteSpike

Vendor: Moonshot Model ID: kimi-k2-6 Capability: 128K context · tool use · streaming · structured output · CJK-native Pricing: per-token, mid tier (live rate) Kimi K2.6 is the recommended starting point for new Chinese-market work on the gateway. Native CJK prompt understanding (sharper than non-Chinese flagships on idiom and document layout), 128K context window, tighter tool-call argument generation. For long-document extraction, agent flows where Chinese tone matters, and CJK-heavy summarisation, this is the default.

Request

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "kimi-k2-6",
    "messages": [{"role": "user", "content": "用这本财报的内容回答：现金流量表的关键变化是什么？"}]
  }'

Body parameters

Field	Type	Required	Default	Notes
`model`	string	yes	—	`kimi-k2-6`
`messages`	array	yes	—	CJK accepted natively.
`max_tokens`	integer	no	model max	Max: 16384.
`tools`	array	no	—	Function calling supported (parallel).
`response_format`	object	no	—	JSON / structured output.
`stream`	boolean	no	false	SSE streaming.

Response

{
  "id": "chatcmpl-…",
  "object": "chat.completion",
  "model": "kimi-k2-6",
  "choices": [{"index": 0, "message": {"role": "assistant", "content": "..."}, "finish_reason": "stop"}],
  "usage": {"prompt_tokens": 142850, "completion_tokens": 524, "total_tokens": 143374}
}

Code examples

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{"model": "kimi-k2-6", "messages": [{"role": "user", "content": "提取关键条款"}]}'

Streaming + caching

"stream": true for SSE. Automatic prompt caching — for long prompts, cache hits are the highest-leverage cost optimisation.

Errors

Code	Trigger	Billed?
400 / 401 / 402 / 422 / 429	Standard	No
5xx	Upstream	No (auto-retry)

When to use

New Chinese-market projects — default starting point for CJK work.
Long-document extraction in Chinese / Japanese / Korean.
For larger context (1M) at flagship cost, see Gemini 3.1 Pro.
For Chinese open-weight alternatives, see GLM-5-1 or DeepSeek V4 Pro.

Limits

Limit	Value
Context window	128K tokens
Max output	16384 tokens
Supports tool use	Yes (parallel)
Supports vision	No
Supports streaming	Yes
Supports prompt caching	Automatic

Kimi K2.5 GLM-5

​Request

​Body parameters

​Response

​Code examples

​Streaming + caching

​Errors

​When to use

​Limits