Create Video

Generate an AI-powered narrated video from a prompt — script, scenes, TTS, and stock clips. Designed for autonomous agents.

POST /v1/platform/videos

Creates a video request. The AI:

  1. Writes a multi-scene narration script from your prompt.
  2. Generates TTS audio (gpt-4o-mini-tts, voice alloy).
  3. Searches Pexels and Pixabay for matching stock clips per scene.
  4. Builds subtitle chunks aligned to TTS duration.
  5. Saves everything as editable scenes returned in the response.

The render step (assembling the final .mp4 via Remotion Lambda) is not auto-triggered — call POST /edit/:editToken/render when you (or a human reviewer) are happy with the scenes.

Agent Quick Reference

You want…Call thisResult
AI scene script + TTS + stock clipsPOST /videos with mode: "sync"Response in 30–90 s
Background generationPOST /videos with mode: "async"Returns immediately. Poll GET /videos/:id
Idempotent retry safetyAdd Idempotency-Key headerSame key returns the original response
To render the final mp4POST /videos/edit/:editToken/renderRemotion Lambda runs; poll for status
To swap a scene's stock clipPOST /videos/edit/:editToken/scenes/:sceneId/clip/selectReplaces clip in place
To regenerate TTS for one scenePOST /videos/edit/:editToken/scenes/:sceneId/ttsRe-narrates that scene only
Background musicPOST /videos/edit/:editToken/bgm/selectAdds preset BGM track

Request Body

FieldTypeRequiredDescription
promptstringYesTopic/brief for the narrated video. Concrete prompts produce stronger scripts.
titlestringNoOverride the AI-generated title.
descriptionstringNoOverride the AI-generated description.
languagestringNoOutput language (default: en). Affects script + TTS pronunciation.
sceneCountnumberNoTarget number of scenes (1–50, default: 5).
aspectRatiostringNo16:9 (default), 9:16 (vertical/Reels), or 1:1.
targetDurationSecondsnumberNoTarget total duration (10–600 s, default: 60). The AI scales script length to fit.
tierstringNoAccepted for backwards compatibility but has no effect on price (single-tier pricing post-v2).
modestringNosync (default) or async. See Async Mode.
idempotencyKeystringNoPrevents duplicate processing. Can also be sent as Idempotency-Key HTTP header.

Pricing

The Video API is billed per generated scene for the base creation flow. Render minutes and TTS regeneration are billed as separate sub-events at the endpoints that trigger them.

EventEndpointPrice
Base scene generationPOST /videos (per scene generated)$0.04 / scene
TTS regenerationPOST /videos/edit/.../scenes/.../tts$0.02 / call
Remotion renderCharged on render-callback completion$0.15 / video minute

The priceSnapshot on the create response covers only base scene generation. Render-time and TTS-regen billing produces separate PlatformUsageRecord rows with their own priceSnapshot payloads. Render minutes are tracked at fractional precision (e.g. 1.4 min × $0.15 = $0.021) — partial seconds are preserved end-to-end.

Legacy tier field: still accepted but has no effect on price.

Example Request

curl -X POST https://api.tutorflow.io/v1/platform/videos \
  -H "Authorization: Bearer tf_platform_..." \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What is photosynthesis? A 30-second explainer for kids",
    "sceneCount": 4,
    "aspectRatio": "16:9",
    "targetDurationSeconds": 30
  }'

Response Fields

FieldTypeDescription
idstringVideo request ID.
videoIdstring | nullInternal video content ID. null until generation completes.
statusstringPENDING, PROCESSING, COMPLETED, or FAILED.
isTerminalbooleantrue once status is COMPLETED or FAILED.
titlestring | nullAI-generated or overridden title.
descriptionstring | nullShort description.
languagestring | nullOutput language.
sceneCountnumber | nullNumber of scenes actually produced.
aspectRatiostring | nullEchoed aspect ratio.
targetDurationSecondsnumber | nullEchoed target duration.
slugstring | nullURL-friendly slug.
tierstring | nullPricing tier used.
modestring | nullsync or async.
priceSnapshotobject | nullPricing details captured at request time.
shareTokenstring | nullPermanent token for the public viewer.
editTokenstring | nullSliding-window token for the editor and edit-time mutations.
editTokenExpiresAtstring | nullISO 8601 expiry. Extended automatically on every read.
previewUrlstring | nullEditor — /{locale}/platform/videos/edit/{editToken}.
publicUrlstring | nullPublic viewer — /{locale}/platform/videos/{shareToken}.
renderStatusstring | nullIDLE, RENDERING, COMPLETED, or FAILED. IDLE immediately after creation.
videoKeystring | nullS3 key where the rendered mp4 will live (set when render is triggered).
renderTriggerUrlstring | nullEndpoint to call to start a render.
renderPollUrlstring | nullEndpoint to poll for render progress.
pollAfterMsnumber | nullSuggested polling interval (async only).
idempotencyKeystring | nullEchoed idempotency key.
idempotentReplayboolean | nulltrue if this is a replay.
createdAtstringISO 8601 timestamp.
completedAtstring | nullISO 8601 timestamp scene generation finished.

Example Response

{
  "id": "e41a085a-e43f-4f63-92f4-9e11250f6e63",
  "videoId": "e10b8286-94ad-4139-a946-3548b46f6d07",
  "status": "COMPLETED",
  "isTerminal": true,
  "title": "What Is Photosynthesis for Kids?",
  "description": "A 30-second explainer covering how plants make food and produce oxygen.",
  "language": "en",
  "sceneCount": 4,
  "aspectRatio": "16:9",
  "targetDurationSeconds": 30,
  "slug": "what-is-photosynthesis-for-kids-d6a02f3b",
  "tier": "default",
  "mode": "sync",
  "priceSnapshot": {
    "category": "video",
    "catalogKey": "video.default",
    "tier": "default",
    "unit": "scene",
    "unitPrice": 0.04,
    "units": 4,
    "amountUsd": 0.16,
    "currency": "USD",
    "source": "platform_pricing_catalog_v2"
  },
  "shareToken": "b64adc04eca28e709add5568af2c414c",
  "editToken": "2dce05b2f639b93ec7559d779b3db71c",
  "editTokenExpiresAt": "2026-04-25T14:15:28.628Z",
  "previewUrl": "https://tutorflow.io/en/platform/videos/edit/2dce05b2f639b93ec7559d779b3db71c",
  "publicUrl": "https://tutorflow.io/en/platform/videos/b64adc04eca28e709add5568af2c414c",
  "renderStatus": "IDLE",
  "videoKey": null,
  "renderPollUrl": "GET /v1/platform/videos/edit/2dce05b2f639b93ec7559d779b3db71c",
  "renderTriggerUrl": "POST /v1/platform/videos/edit/2dce05b2f639b93ec7559d779b3db71c/render",
  "idempotencyKey": null,
  "idempotentReplay": null,
  "createdAt": "2026-04-25T04:14:47.546Z",
  "completedAt": "2026-04-25T04:15:28.677Z",
  "pollAfterMs": null
}

Workflow

The natural agent workflow is a two-step pattern:

1. POST /v1/platform/videos scenes ready, renderStatus: IDLE
2. POST /v1/platform/videos/edit/:editToken/render render starts
3. GET  /v1/platform/videos/edit/:editToken poll until renderStatus: COMPLETED
4. video file is at videoKey on S3

You can hand the user the previewUrl between steps 1 and 2 so they review scenes before paying for a render. Or skip straight to step 2 if you trust the scenes blindly.

Triggering a Render

curl -X POST https://api.tutorflow.io/v1/platform/videos/edit/{editToken}/render

Response is the full VideoEditResDto with renderStatus: "RENDERING" and videoKey set to where the mp4 will appear.

To poll progress:

curl https://api.tutorflow.io/v1/platform/videos/edit/{editToken}

When renderStatus === "COMPLETED", fetch the file from https://{s3-bucket}.s3.{region}.amazonaws.com/{videoKey}.

To cancel an in-progress render:

curl -X POST https://api.tutorflow.io/v1/platform/videos/edit/{editToken}/render/cancel

This sets renderStatus back to IDLE. The Lambda may finish in the background, but its callback is ignored.

Editing Scenes

The edit token also unlocks per-scene mutations. See Edit Video for the full surface (add scene, update scene, replace clip, regenerate TTS, add BGM, upload custom clip, sync durations).

Async Mode

Set mode: "async" to queue generation as a background job. The response returns immediately with status: "PENDING". Poll GET /v1/platform/videos/:id until isTerminal is true.

Idempotency

Pass an idempotencyKey in the request body or Idempotency-Key header to prevent duplicate generation. Reusing the same key returns the original response with idempotentReplay: true.