kimi-k2-6
Capability: 128K context · tool use · streaming · structured output · CJK-native
Pricing: per-token, mid tier (live rate)
Kimi K2.6 is the recommended starting point for new Chinese-market
work on the gateway. Native CJK prompt understanding (sharper than
non-Chinese flagships on idiom and document layout), 128K context
window, tighter tool-call argument generation. For
long-document extraction, agent flows where Chinese tone matters,
and CJK-heavy summarisation, this is the default.
Request
Body parameters
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
model | string | yes | — | kimi-k2-6 |
messages | array | yes | — | CJK accepted natively. |
max_tokens | integer | no | model max | Max: 16384. |
tools | array | no | — | Function calling supported (parallel). |
response_format | object | no | — | JSON / structured output. |
stream | boolean | no | false | SSE streaming. |
Response
Code examples
Streaming + caching
"stream": true for SSE. Automatic prompt caching — for long
prompts, cache hits are the highest-leverage cost optimisation.
Errors
| Code | Trigger | Billed? |
|---|---|---|
| 400 / 401 / 402 / 422 / 429 | Standard | No |
| 5xx | Upstream | No (auto-retry) |
When to use
- New Chinese-market projects — default starting point for CJK work.
- Long-document extraction in Chinese / Japanese / Korean.
- For larger context (1M) at flagship cost, see Gemini 3.1 Pro.
- For Chinese open-weight alternatives, see GLM-5-1 or DeepSeek V4 Pro.
Limits
| Limit | Value |
|---|---|
| Context window | 128K tokens |
| Max output | 16384 tokens |
| Supports tool use | Yes (parallel) |
| Supports vision | No |
| Supports streaming | Yes |
| Supports prompt caching | Automatic |