GLM-5.1 - ByteSpike

Vendor: Zhipu (智谱) Model ID: glm-5-1 Capability: 128K context · vision · tool use · streaming · structured output · CJK-native Pricing: per-token, mid tier (live rate) GLM-5.1 is the refinement step on GLM-5 — same context window with added vision support, tighter tool-call argument generation, and a measurable quality bump on Chinese-market code generation. Default starting point for new Chinese-market projects on the gateway.

Request

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "glm-5-1",
    "messages": [{"role": "user", "content": "用中文写一份产品发布公告。"}]
  }'

Body parameters

Field	Type	Required	Default	Notes
`model`	string	yes	—	`glm-5-1`
`messages`	array	yes	—	CJK accepted natively. Vision via `image_url` blocks.
`max_tokens`	integer	no	model max	Max: 16384.
`tools`	array	no	—	Function calling supported (parallel).
`response_format`	object	no	—	JSON / structured output.
`stream`	boolean	no	false	SSE streaming.

Response

{
  "id": "chatcmpl-…",
  "object": "chat.completion",
  "model": "glm-5-1",
  "choices": [{"index": 0, "message": {"role": "assistant", "content": "..."}, "finish_reason": "stop"}],
  "usage": {"prompt_tokens": 28, "completion_tokens": 412, "total_tokens": 440}
}

Code examples

curl https://llm.bytespike.ai/v1/chat/completions \
  -H "Authorization: Bearer $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{"model": "glm-5-1", "messages": [{"role": "user", "content": "写产品发布公告"}]}'

Streaming + caching

"stream": true for SSE. Automatic prompt caching.

Errors

Code	Trigger	Billed?
400 / 401 / 402 / 422 / 429	Standard	No
5xx	Upstream	No (auto-retry)

When to use

New Chinese-market projects with mixed text + vision input.
Tool-using agents in CJK languages.
For prior version (no vision), see GLM-5.
For longer context, see Kimi K2.6.
For Chinese code-heavy work, see DeepSeek V4 Pro.

Limits

Limit	Value
Context window	128K tokens
Max output	16384 tokens
Supports tool use	Yes (parallel)
Supports vision	Yes
Supports streaming	Yes
Supports prompt caching	Automatic

GLM-5 MiniMax M2.5

​Request

​Body parameters

​Response

​Code examples

​Streaming + caching

​Errors

​When to use

​Limits