Per-protocol shape
- Anthropic Messages
- OpenAI Chat Completions
- OpenAI Responses
- Gemini Native
Pass
"stream": true in the body. Event names match Anthropic’s
native protocol:Headers ship before the first SSE frame
Quota + rate-limit headers are sent in the HTTP response headers, before the first event. You can short-circuit a too-expensive stream by closing the connection after reading headers:output_tokens consumed up to the abort point.
Aborting cleanly
If you have a long-running stream you want to cut off (user clicked stop), just close the HTTP connection — there’s no separate “abort” call. The gateway propagates the disconnect to the model, billing settles on partial output. For video generation (async via/v1/tasks/submit), use
POST /v1/tasks/cancel
instead — closing the submit response doesn’t cancel the in-flight
render.
Mid-stream errors
If the model fails partway through, the gateway emits a finalevent: error (Anthropic) or terminal frame with error field
(OpenAI) and closes the connection. The partial output you’ve
already received is yours — but the request does not bill in
the mid-stream-error case. Failures don’t bill ever, full stop.
Reconnecting
SSE supports theLast-Event-ID header for reconnects, but ByteSpike
does not stage streams server-side — there’s nothing to replay.
If your connection drops mid-stream, you’ll need to resend the
request from scratch.
For long completions where flakiness is a concern, prefer non-streaming
mode with a generous client timeout, or use Anthropic’s prompt
caching to make retries cheap.
SDK behaviour
All three official SDKs handle streaming automatically:Common gotchas
| Issue | Cause | Fix |
|---|---|---|
| Stream stalls after first chunk | Corporate proxy buffering / non-passthrough | Add llm.bytespike.ai to NO_PROXY |
| Lines arrive in chunks of N | Default HTTP buffering — purely cosmetic | Use SDK or iter_lines() — semantic chunks always frame correctly |
event: lines missing | Your parser only reads data: lines | Use a real SSE parser — event names matter for Anthropic-shape and Responses-shape |
| Usage not in final frame (OpenAI) | Missing stream_options.include_usage: true | Set it |
| Final frame missing on Gemini | No [DONE] marker — stream ends on connection close | Treat connection-close as end-of-stream |