sora2 is OpenAI’s Sora 2 model — text-to-video and image-to-video on a two-phase task-based endpoint: submit returns a task_id, the client polls until the result_url is populated. Billed by the second of generated footage. The reason to reach for it: Sora-grade physical motion, camera dynamics, and shot composition; the trade-off vs lighter tiers is per-second cost and longer total wall-clock to first frame.
Pricing: $0.10 / second of generated footage — see the rate card. Failures don’t bill; per-second pricing applies to the generated footage length, not poll latency.
Protocols
| Protocol | Path | Purpose |
|---|---|---|
| OpenAI Video — submit | POST https://llm.bytespike.ai/v1/videos/generations | enqueues; returns task_id |
| OpenAI Video — poll | GET https://llm.bytespike.ai/v1/videos/tasks/{task_id} | returns status and result_url when ready |
video-tools plugin handles polling automatically — your chat just sees the clip appear when ready.
Quickstart
Capabilities
| Capability | Supported |
|---|---|
| Text-to-video | ✅ |
Image-to-video (with source_image) | ✅ |
duration_seconds 5 / 10 / 15 | ✅ |
size 1280×720 / 1920×1080 | ✅ |
| Task-based polling | ✅ |
| Modality | video |
| Capability bucket | video_generate |
When to use
- Sora-grade footage — physical motion, camera moves, multi-subject scenes.
- Marketing video — hero clip, social-card animation, product story.
- DOSIA
generate_video— the main brain picks this when permissioned, polls automatically throughvideo-tools.poll_video.
- Cost-sensitive volume — drop to
veo3.1-fastorseedance2-fast. - Chinese aesthetic / East-Asian subjects —
seedance-proorseedance2.
Next
sora2-pro— higher tier, longer clips, richer dynamicsveo3.1— Google alternative at similar tier- Multimodal endpoints — overview