Video Generation

JSON-driven video generation pipeline using Remotion. Author timelines as structured JSON, render to MP4 with theme support, TTS narration, transitions, and a full component library — all from the command line.

Default Active: Yes AI & Media

Active by default: The video skill is injected into the system prompt automatically. It activates whenever the agent detects keywords like "video", "Remotion", "animation", "mp4", "render video", or "slides to video". No manual loading required.

Quick Reference

Skill namevideo
CategoryAI & Media
Default activeYes — auto-injected on trigger keywords
Triggersvideo, Remotion, animation, mp4, render video, slides to video
SKILL.md pathskills_ref/video/SKILL.md
Output directory/tmp/remotion-render/
RuntimeNode.js 22.4+, pnpm, ffmpeg, ffprobe
Bootstrapnode scripts/ensure-remotion.mjs
TTS providersGemini (default), Supertone, Supertonic
Related skillspptx, diagram

CLI Commands

All commands are run from the skills_ref/video/ directory. Output goes to /tmp/remotion-render/ by default — keep render output outside the source tree.

TaskCommand
Render (sync)node scripts/pipeline.mjs --timeline <path> [--preset Landscape-1080p]
Render + TTSnode scripts/pipeline.mjs --timeline timeline.draft.json
Skip TTSnode scripts/pipeline.mjs --timeline timeline.draft.json --skip-tts
TTS batchnode scripts/tts.mjs --batch timeline.draft.json [--provider supertone]
TTS singlenode scripts/tts.mjs --text "Hello" --output /tmp/tts-out.m4a [--provider gemini]
List voicesnode scripts/tts.mjs --list-voices [--provider supertone]
Render (async)node scripts/pipeline.mjs --timeline <path> --async
Check statusnode scripts/pipeline.mjs --status /tmp/remotion-render/render-result.json
Previewcd remotion-project && pnpm exec remotion studio
Validatenode scripts/validate-artifact.mjs /tmp/remotion-render/TimelineVideo.mp4 --preset Landscape-1080p

Pipeline Usage

# Sync (default) — blocks until render complete
node skills_ref/video/scripts/pipeline.mjs \
  --timeline timeline.json \
  --output /tmp/remotion-render

# With preset override (timeline.meta.preset is the source of truth)
node skills_ref/video/scripts/pipeline.mjs \
  --timeline timeline.json \
  --preset Portrait-1080p

# Async — returns immediately, renders in background
node skills_ref/video/scripts/pipeline.mjs \
  --timeline timeline.json \
  --async

# Check async status
node skills_ref/video/scripts/pipeline.mjs \
  --status /tmp/remotion-render/render-result.json
Preset source of truth: timeline.meta.preset is the canonical resolution setting. The CLI --preset flag overrides it but emits a warning. Always prefer setting the preset in the timeline JSON.

Timeline Authoring

Videos are defined as JSON timelines with a meta block (title, preset, fps, duration) and an elements array of typed slides. Each element specifies its type, timing, props, and optional transition.

Minimal Example

{
  "meta": {
    "title": "My Video",
    "preset": "Landscape-1080p",
    "fps": 30,
    "totalDurationSec": 15
  },
  "elements": [
    {
      "type": "title",
      "startSec": 0,
      "durationSec": 5,
      "props": { "title": "Hello World", "subtitle": "A demo" },
      "transition": { "type": "fade" }
    },
    {
      "type": "content",
      "startSec": 5,
      "durationSec": 5,
      "props": {
        "header": "Key Points",
        "bulletPoints": ["Fast", "Safe", "Beautiful"]
      },
      "transition": { "type": "slide", "direction": "from-right" }
    },
    {
      "type": "code",
      "startSec": 10,
      "durationSec": 5,
      "props": {
        "code": "const x = 42;",
        "language": "typescript",
        "title": "Example"
      },
      "transition": { "type": "fade" }
    }
  ],
  "audio": []
}

Theme System

Each video should define a unique aesthetic via meta.theme. Fonts are loaded through @remotion/google-fonts for cross-platform rendering. The default theme uses Chakra Petch (display), Outfit (body), and JetBrains Mono (code) with a dark blue background and cyan accent.

{
  "meta": {
    "theme": {
      "aesthetic": "brutalist tech",
      "font": { "display": "Chakra Petch", "body": "Outfit" },
      "color": { "accent": "#FF6B35", "bg": "#0A0A0A" },
      "gradient": {
        "hero": "radial-gradient(circle at 20% 30%, rgba(255,107,53,0.2) 0%, transparent 60%)"
      }
    }
  }
}
Font ban: Do not use Inter, Roboto, or Arial in meta.theme. These signal generic/AI-generated content and break the distinctive aesthetic requirement.

Content Design Rules

Do

Avoid

Content Density

The canvas is large (1920x1080 or 1080x1920). Sparse content creates dead space. Add more content blocks — not bigger fonts or more padding. Each slide should use at least 70% of the canvas area.

Slide TypeLandscapePortrait
Content3–4 bullet points4–5 bullet points
CodeMinimum 4 lines + commentsMinimum 6 lines + comments
TitleAlways include a subtitle

Shorts (Portrait-1080p)

Resolution Presets

Landscape-720p
1280 x 720
16:9
Draft / preview
Landscape-1080p
1920 x 1080
16:9
Standard delivery (default)
Portrait-1080p
1080 x 1920
9:16
TikTok / Reels / Shorts
Square-1080p
1080 x 1080
1:1
Instagram / LinkedIn

Default: Landscape-1080p. The agent auto-selects based on keywords — "reels", "shorts", or "TikTok" triggers Portrait-1080p; "Instagram" triggers Square-1080p.

Component Library

The Remotion project ships 11 slide components plus supporting features. Each maps to a type value in the timeline JSON.

Slide Components

ComponentPropsUse For
TitleSlidetitle, subtitle, animation?Opening / closing slides
ContentSlideheader, content, bulletPoints, animation?Body content
CodeSlidecode, language, title, animation?Code demos
DiagramSlidesrc, title, caption, fit, animation?Images / diagrams
StatSlidetitle, stats[] (value/suffix/label/trend/decimals)KPI / count-up
QuoteSlidequote, author?, source?Quotes
ComparisonSlidetitle, left{label,items,accent}, right{...}Side-by-side compare
VideoSlidesrc, title?, startFrom?, playbackRate?, loop?Inline video
GifSlidesrc, title?, fit?Animated GIF
LottieSlidesrc, title?Lottie animation
ChartSlidechartType, title, data{labels,datasets}Bar / pie / line chart
Captiontext, position, designTheme?Timed subtitles

Features

FeatureDetails
Surface CardGlassmorphism wrapper on content slides. Customize via meta.theme.card.
AnimationOptional per element: { enter: "scale-in", exit: "fade-out" }. Enter: scale-in, fade-in, slide-up, none. Exit: scale-out, slide-down, fade-out, none.
Transitionsfade, slide, wipe, flip, clock-wipe. Optional timing: "spring". Slide/wipe accept direction.
Korean fontsNotoSansKR loaded as body primary. Stack: NotoSansKR, Outfit, sans-serif.
Chartsbar (staggered grow), pie (sweep), line (draw-on). Pure SVG, no external library.
AudiofadeInSec / fadeOutSec, loop, trimStartSec.

TTS Integration

Three TTS providers are available. The default is Gemini. The pipeline auto-detects narration fields in timeline elements and generates per-cut audio files.

ProviderIDDefault VoiceStrengthsEnv Key
GeminigeminiKore30 voices, tone via promptGEMINI_API_KEY
SupertonesupertoneAndrew6 emotion styles, best Korean qualitySUPERTONE_API_KEY
SupertonicsupertonicM10.22s generation, free, offlinenone

Draft-to-Final Workflow

  1. Write timeline.draft.json with narration and optional voiceControl per element.
  2. Pipeline auto-detects narration fields and generates per-cut audio.
  3. Produces timeline.final.json with audio[] entries and corrected durationSec.
  4. Final timeline is rendered with synchronized audio.

Draft Timeline with Narration

{
  "elements": [
    {
      "id": "intro",
      "type": "title",
      "durationSec": 11,
      "narration": "Welcome to the analysis report.",
      "voiceControl": {
        "tonePrompt": "Calm, professional news anchor tone"
      },
      "props": {
        "title": "Tech Trends",
        "subtitle": "2026 Edition"
      }
    }
  ]
}

Duration Estimation

Render Validation — 3-Tier Gate

All three gates must pass for a render to be considered successful.

GateChecksRole
1Policy — No forbidden engine in logsSupplementary
2Executionremotion render exit code 0Primary
3Artifact — ffprobe validates duration, codec, and resolutionFinal truth
node scripts/validate-artifact.mjs /tmp/remotion-render/TimelineVideo.mp4 --preset Landscape-1080p
Gate 3 is the final truth: Even if remotion render exits 0, the artifact must pass ffprobe validation (correct resolution, codec, and duration) before the render is accepted.

FFmpeg Patterns

FFmpeg handles deterministic cuts, batch processing, and preprocessing outside of Remotion. These patterns are available for use within the video pipeline.

Extract segment by timestamp

ffmpeg -i raw.mp4 -ss 00:12:30 -to 00:15:45 -c copy segment_01.mp4

Batch cut from edit decision list

#!/bin/bash
# cuts.txt format: start,end,label
while IFS=, read -r start end label; do
  ffmpeg -i raw.mp4 -ss "$start" -to "$end" -c copy "segments/${label}.mp4"
done < cuts.txt

Concatenate segments

for f in segments/*.mp4; do echo "file '$f'"; done > concat.txt
ffmpeg -f concat -safe 0 -i concat.txt -c copy assembled.mp4

Create proxy for faster editing

ffmpeg -i raw.mp4 -vf "scale=960:-2" -c:v libx264 -preset ultrafast -crf 28 proxy.mp4

Extract audio for transcription

ffmpeg -i raw.mp4 -vn -acodec pcm_s16le -ar 16000 audio.wav

Normalize audio levels

ffmpeg -i segment.mp4 -af loudnorm=I=-16:TP=-1.5:LRA=11 -c:v copy normalized.mp4

Scene detection

# Detect scene changes (threshold 0.3 = moderate sensitivity)
ffmpeg -i input.mp4 -vf "select='gt(scene,0.3)',showinfo" -vsync vfr -f null - 2>&1 | grep showinfo

Social media reframing

# 16:9 to 9:16 (center crop for TikTok/Reels)
ffmpeg -i input.mp4 -vf "crop=ih*9/16:ih,scale=1080:1920" vertical.mp4

# 16:9 to 1:1 (center crop for Instagram)
ffmpeg -i input.mp4 -vf "crop=ih:ih,scale=1080:1080" square.mp4

ElevenLabs Voice (Supplementary)

For professional voiceover outside the Remotion TTS pipeline, ElevenLabs is available via direct API. Use when you need high-fidelity English narration, voice cloning, or emotion control beyond what Gemini/Supertone provide. Requires ELEVENLABS_API_KEY.

import os, requests

resp = requests.post(
    f"https://api.elevenlabs.io/v1/text-to-speech/<voice_id>",
    headers={
        "xi-api-key": os.environ["ELEVENLABS_API_KEY"],
        "Content-Type": "application/json"
    },
    json={
        "text": "Your narration text here",
        "model_id": "eleven_turbo_v2_5",
        "voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
    }
)
with open("voiceover.mp3", "wb") as f:
    f.write(resp.content)

Final Polish (Descript / CapCut)

For tasks that code-driven rendering handles poorly, use a traditional editor as the last mile:

Division of labor: AI clears the repetitive work — timeline authoring, rendering, TTS. Final creative taste lives in the manual polish layer.

Dependencies

The video pipeline requires Node.js, pnpm, FFmpeg, and a Chromium binary (auto-installed by Remotion).

Node.js 22.4+ node --version
Runtime for pipeline scripts and Remotion
pnpm npm install -g pnpm
Package manager for the Remotion project
FFmpeg + ffprobe brew install ffmpeg
Video/audio processing, artifact validation, segment extraction
Chromium (auto) remotion browser ensure
Auto-installed by ensure-remotion.mjs bootstrap script

Environment Variables

VariablePurposeRequired
GEMINI_API_KEYGemini TTS providerYes (if using Gemini TTS, which is default)
SUPERTONE_API_KEYSupertone TTS providerOnly if using Supertone
ELEVENLABS_API_KEYElevenLabs supplementary TTSOnly if using ElevenLabs
GOOGLE_APPLICATION_CREDENTIALSVertex AI authenticationAlternative to gcloud auth

Bootstrap

# First-time setup: installs packages and ensures Chromium is available
node scripts/ensure-remotion.mjs

# Verify setup
cd remotion-project && pnpm exec remotion studio

Project Structure

skills_ref/video/
+-- SKILL.md
+-- reference/                     # visual-quality.md, tts-integration.md, components.md
+-- scripts/
|   +-- pipeline.mjs               # CLI entrypoint (sync/async/TTS)
|   +-- tts.mjs                    # TTS orchestrator (multi-provider)
|   +-- tts-providers/             # gemini.mjs, supertone.mjs, supertonic.mjs
|   +-- ensure-remotion.mjs        # Runtime bootstrap
|   +-- validate-artifact.mjs      # Gate 3: ffprobe validation
|   +-- presets.mjs                # Resolution presets (ESM)
+-- remotion-project/
    +-- public/example-timeline.json
    +-- src/
        +-- components/            # 11 slide components + barrel
        +-- timeline/              # JSON-to-React engine

"~해줌" Usage Examples

Real-world examples of how to invoke video tasks in natural language. Korean and English both work.

"이 발표 자료 영상으로 만들어줌 — 1080p 가로 모드, 30초"
Creates a Landscape-1080p timeline from your presentation content. The agent authors a JSON timeline with title, content, and code slides, selects an appropriate theme, and renders via the sync pipeline. Output at /tmp/remotion-render/TimelineVideo.mp4.
"TikTok 쇼츠 영상 만들어줌 — AI 트렌드 주제로 60초"
Auto-selects Portrait-1080p (9:16) preset. Generates 8–10 elements with a hook-first opening and CTA closing. Applies the Shorts content rules: dense bullets, 6+ line code slides, and varied transitions.
"이 타임라인에 나레이션 추가해줌 — 차분한 톤으로"
Adds narration and voiceControl fields to each element in the draft timeline. Runs the TTS pipeline with the default Gemini provider (voice: Kore), generates per-cut audio, and produces timeline.final.json with synced durations.
"이 영상 16:9에서 9:16으로 바꾸줌 — 텍톡 중앙 크롭"
Uses the FFmpeg social media reframing pattern to center-crop a landscape video to portrait. Runs: ffmpeg -i input.mp4 -vf "crop=ih*9/16:ih,scale=1080:1920" vertical.mp4.
"렌더링 결과 검증해줌 — 해상도량 코덱 맞는지 확인"
Runs the 3-tier validation gate on the rendered artifact. Executes validate-artifact.mjs to verify duration, codec, and resolution via ffprobe against the expected preset.

Troubleshooting

Remotion Chromium not found

If remotion render fails with a browser error, ensure Chromium is installed:

node scripts/ensure-remotion.mjs
# Or manually:
cd remotion-project && pnpm exec remotion browser ensure

FFmpeg not found

Install FFmpeg and ffprobe:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install -y ffmpeg

# Verify
ffmpeg -version && ffprobe -version

TTS fails with auth error

Check that the appropriate environment variable is set for your chosen provider:

# Gemini (default)
echo $GEMINI_API_KEY

# Supertone
echo $SUPERTONE_API_KEY

# Supertonic requires no key (runs offline)

Render output is empty or corrupt

Run the artifact validator to diagnose:

node scripts/validate-artifact.mjs /tmp/remotion-render/TimelineVideo.mp4 --preset Landscape-1080p

If Gate 3 fails, check the Remotion logs for rendering errors. Common causes: missing fonts, invalid timeline JSON, or out-of-memory on large compositions.

Duration mismatch after TTS

The pipeline auto-corrects durationSec via ffprobe measurement after TTS generation. If durations still seem off, verify the narration text length matches the estimation formula: Math.ceil(text.length / 6.5) + 0.5 for Korean text.