Skip to main content
Vendor: Moonshot Model ID: kimi-k2-6 Capability: 128K context · tool use · streaming · structured output · CJK-native Pricing: per-token, mid tier (live rate) Kimi K2.6 is the recommended starting point for new Chinese-market work on the gateway. Native CJK prompt understanding (sharper than non-Chinese flagships on idiom and document layout), 128K context window, tighter tool-call argument generation. For long-document extraction, agent flows where Chinese tone matters, and CJK-heavy summarisation, this is the default.

Request

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "kimi-k2-6",
    "messages": [{"role": "user", "content": "用这本财报的内容回答:现金流量表的关键变化是什么?"}]
  }'

Body parameters

FieldTypeRequiredDefaultNotes
modelstringyeskimi-k2-6
messagesarrayyesCJK accepted natively.
max_tokensintegernomodel maxMax: 16384.
toolsarraynoFunction calling supported (parallel).
response_formatobjectnoJSON / structured output.
streambooleannofalseSSE streaming.

Response

{
  "id": "chatcmpl-…",
  "object": "chat.completion",
  "model": "kimi-k2-6",
  "choices": [{"index": 0, "message": {"role": "assistant", "content": "..."}, "finish_reason": "stop"}],
  "usage": {"prompt_tokens": 142850, "completion_tokens": 524, "total_tokens": 143374}
}

Code examples

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{"model": "kimi-k2-6", "messages": [{"role": "user", "content": "提取关键条款"}]}'

Streaming + caching

"stream": true for SSE. Automatic prompt caching — for long prompts, cache hits are the highest-leverage cost optimisation.

Errors

CodeTriggerBilled?
400 / 401 / 402 / 422 / 429StandardNo
5xxUpstreamNo (auto-retry)

When to use

  • New Chinese-market projects — default starting point for CJK work.
  • Long-document extraction in Chinese / Japanese / Korean.
  • For larger context (1M) at flagship cost, see Gemini 3.1 Pro.
  • For Chinese open-weight alternatives, see GLM-5-1 or DeepSeek V4 Pro.

Limits

LimitValue
Context window128K tokens
Max output16384 tokens
Supports tool useYes (parallel)
Supports visionNo
Supports streamingYes
Supports prompt cachingAutomatic