Video Generation
JSON-driven video generation pipeline using Remotion. Author timelines as structured JSON, render to MP4 with theme support, TTS narration, transitions, and a full component library — all from the command line.
Default Active: Yes AI & Media
video skill is injected into the system prompt automatically. It activates whenever the agent detects keywords like "video", "Remotion", "animation", "mp4", "render video", or "slides to video". No manual loading required.
Quick Reference
| Skill name | video |
| Category | AI & Media |
| Default active | Yes — auto-injected on trigger keywords |
| Triggers | video, Remotion, animation, mp4, render video, slides to video |
| SKILL.md path | skills_ref/video/SKILL.md |
| Output directory | /tmp/remotion-render/ |
| Runtime | Node.js 22.4+, pnpm, ffmpeg, ffprobe |
| Bootstrap | node scripts/ensure-remotion.mjs |
| TTS providers | Gemini (default), Supertone, Supertonic |
| Related skills | pptx, diagram |
CLI Commands
All commands are run from the skills_ref/video/ directory. Output goes to /tmp/remotion-render/ by default — keep render output outside the source tree.
| Task | Command |
|---|---|
| Render (sync) | node scripts/pipeline.mjs --timeline <path> [--preset Landscape-1080p] |
| Render + TTS | node scripts/pipeline.mjs --timeline timeline.draft.json |
| Skip TTS | node scripts/pipeline.mjs --timeline timeline.draft.json --skip-tts |
| TTS batch | node scripts/tts.mjs --batch timeline.draft.json [--provider supertone] |
| TTS single | node scripts/tts.mjs --text "Hello" --output /tmp/tts-out.m4a [--provider gemini] |
| List voices | node scripts/tts.mjs --list-voices [--provider supertone] |
| Render (async) | node scripts/pipeline.mjs --timeline <path> --async |
| Check status | node scripts/pipeline.mjs --status /tmp/remotion-render/render-result.json |
| Preview | cd remotion-project && pnpm exec remotion studio |
| Validate | node scripts/validate-artifact.mjs /tmp/remotion-render/TimelineVideo.mp4 --preset Landscape-1080p |
Pipeline Usage
# Sync (default) — blocks until render complete
node skills_ref/video/scripts/pipeline.mjs \
--timeline timeline.json \
--output /tmp/remotion-render
# With preset override (timeline.meta.preset is the source of truth)
node skills_ref/video/scripts/pipeline.mjs \
--timeline timeline.json \
--preset Portrait-1080p
# Async — returns immediately, renders in background
node skills_ref/video/scripts/pipeline.mjs \
--timeline timeline.json \
--async
# Check async status
node skills_ref/video/scripts/pipeline.mjs \
--status /tmp/remotion-render/render-result.json
timeline.meta.preset is the canonical resolution setting. The CLI --preset flag overrides it but emits a warning. Always prefer setting the preset in the timeline JSON.
Timeline Authoring
Videos are defined as JSON timelines with a meta block (title, preset, fps, duration) and an elements array of typed slides. Each element specifies its type, timing, props, and optional transition.
Minimal Example
{
"meta": {
"title": "My Video",
"preset": "Landscape-1080p",
"fps": 30,
"totalDurationSec": 15
},
"elements": [
{
"type": "title",
"startSec": 0,
"durationSec": 5,
"props": { "title": "Hello World", "subtitle": "A demo" },
"transition": { "type": "fade" }
},
{
"type": "content",
"startSec": 5,
"durationSec": 5,
"props": {
"header": "Key Points",
"bulletPoints": ["Fast", "Safe", "Beautiful"]
},
"transition": { "type": "slide", "direction": "from-right" }
},
{
"type": "code",
"startSec": 10,
"durationSec": 5,
"props": {
"code": "const x = 42;",
"language": "typescript",
"title": "Example"
},
"transition": { "type": "fade" }
}
],
"audio": []
}
Theme System
Each video should define a unique aesthetic via meta.theme. Fonts are loaded through @remotion/google-fonts for cross-platform rendering. The default theme uses Chakra Petch (display), Outfit (body), and JetBrains Mono (code) with a dark blue background and cyan accent.
{
"meta": {
"theme": {
"aesthetic": "brutalist tech",
"font": { "display": "Chakra Petch", "body": "Outfit" },
"color": { "accent": "#FF6B35", "bg": "#0A0A0A" },
"gradient": {
"hero": "radial-gradient(circle at 20% 30%, rgba(255,107,53,0.2) 0%, transparent 60%)"
}
}
}
}
meta.theme. These signal generic/AI-generated content and break the distinctive aesthetic requirement.
Content Design Rules
Do
- Use concise headers without emoji
- Write bullet points as short phrases (max 8 words), 3–4 per slide
- Vary slide types: title, content, code, content — avoid monotony
- Mix transitions: fade, slide, wipe (avoid repeating the same type 3+ times)
- Show real code on code slides, not pseudocode
- Pick a theme aesthetic and commit to it throughout
Avoid
- Emoji in slide titles or headers
- More than 5 bullets on a single slide
- Generic headers like "Key Features", "Getting Started", or "Summary"
- Inter, Roboto, or Arial as fonts in
meta.theme
Content Density
The canvas is large (1920x1080 or 1080x1920). Sparse content creates dead space. Add more content blocks — not bigger fonts or more padding. Each slide should use at least 70% of the canvas area.
| Slide Type | Landscape | Portrait |
|---|---|---|
| Content | 3–4 bullet points | 4–5 bullet points |
| Code | Minimum 4 lines + comments | Minimum 6 lines + comments |
| Title | Always include a subtitle | |
Shorts (Portrait-1080p)
- Max 8–10 elements for 60 seconds
- First slide = hook (max 5 words) + descriptive subtitle
- Last slide = CTA or memorable closing + tagline
- 5–6 bullets per content slide, 6–10 lines on code slides
- Use
content+bulletPointstogether on every content slide
Resolution Presets
Default: Landscape-1080p. The agent auto-selects based on keywords — "reels", "shorts", or "TikTok" triggers Portrait-1080p; "Instagram" triggers Square-1080p.
Component Library
The Remotion project ships 11 slide components plus supporting features. Each maps to a type value in the timeline JSON.
Slide Components
| Component | Props | Use For |
|---|---|---|
TitleSlide | title, subtitle, animation? | Opening / closing slides |
ContentSlide | header, content, bulletPoints, animation? | Body content |
CodeSlide | code, language, title, animation? | Code demos |
DiagramSlide | src, title, caption, fit, animation? | Images / diagrams |
StatSlide | title, stats[] (value/suffix/label/trend/decimals) | KPI / count-up |
QuoteSlide | quote, author?, source? | Quotes |
ComparisonSlide | title, left{label,items,accent}, right{...} | Side-by-side compare |
VideoSlide | src, title?, startFrom?, playbackRate?, loop? | Inline video |
GifSlide | src, title?, fit? | Animated GIF |
LottieSlide | src, title? | Lottie animation |
ChartSlide | chartType, title, data{labels,datasets} | Bar / pie / line chart |
Caption | text, position, designTheme? | Timed subtitles |
Features
| Feature | Details |
|---|---|
| Surface Card | Glassmorphism wrapper on content slides. Customize via meta.theme.card. |
| Animation | Optional per element: { enter: "scale-in", exit: "fade-out" }. Enter: scale-in, fade-in, slide-up, none. Exit: scale-out, slide-down, fade-out, none. |
| Transitions | fade, slide, wipe, flip, clock-wipe. Optional timing: "spring". Slide/wipe accept direction. |
| Korean fonts | NotoSansKR loaded as body primary. Stack: NotoSansKR, Outfit, sans-serif. |
| Charts | bar (staggered grow), pie (sweep), line (draw-on). Pure SVG, no external library. |
| Audio | fadeInSec / fadeOutSec, loop, trimStartSec. |
TTS Integration
Three TTS providers are available. The default is Gemini. The pipeline auto-detects narration fields in timeline elements and generates per-cut audio files.
| Provider | ID | Default Voice | Strengths | Env Key |
|---|---|---|---|---|
| Gemini | gemini | Kore | 30 voices, tone via prompt | GEMINI_API_KEY |
| Supertone | supertone | Andrew | 6 emotion styles, best Korean quality | SUPERTONE_API_KEY |
| Supertonic | supertonic | M1 | 0.22s generation, free, offline | none |
Draft-to-Final Workflow
- Write
timeline.draft.jsonwithnarrationand optionalvoiceControlper element. - Pipeline auto-detects narration fields and generates per-cut audio.
- Produces
timeline.final.jsonwithaudio[]entries and correcteddurationSec. - Final timeline is rendered with synchronized audio.
Draft Timeline with Narration
{
"elements": [
{
"id": "intro",
"type": "title",
"durationSec": 11,
"narration": "Welcome to the analysis report.",
"voiceControl": {
"tonePrompt": "Calm, professional news anchor tone"
},
"props": {
"title": "Tech Trends",
"subtitle": "2026 Edition"
}
}
]
}
Duration Estimation
- Korean: ~6.5 chars/sec —
Math.ceil(narration.length / 6.5) + 0.5 - Final
durationSecis auto-corrected by ffprobe measurement after TTS generation
Render Validation — 3-Tier Gate
All three gates must pass for a render to be considered successful.
| Gate | Checks | Role |
|---|---|---|
| 1 | Policy — No forbidden engine in logs | Supplementary |
| 2 | Execution — remotion render exit code 0 | Primary |
| 3 | Artifact — ffprobe validates duration, codec, and resolution | Final truth |
node scripts/validate-artifact.mjs /tmp/remotion-render/TimelineVideo.mp4 --preset Landscape-1080p
remotion render exits 0, the artifact must pass ffprobe validation (correct resolution, codec, and duration) before the render is accepted.
FFmpeg Patterns
FFmpeg handles deterministic cuts, batch processing, and preprocessing outside of Remotion. These patterns are available for use within the video pipeline.
Extract segment by timestamp
ffmpeg -i raw.mp4 -ss 00:12:30 -to 00:15:45 -c copy segment_01.mp4
Batch cut from edit decision list
#!/bin/bash
# cuts.txt format: start,end,label
while IFS=, read -r start end label; do
ffmpeg -i raw.mp4 -ss "$start" -to "$end" -c copy "segments/${label}.mp4"
done < cuts.txt
Concatenate segments
for f in segments/*.mp4; do echo "file '$f'"; done > concat.txt
ffmpeg -f concat -safe 0 -i concat.txt -c copy assembled.mp4
Create proxy for faster editing
ffmpeg -i raw.mp4 -vf "scale=960:-2" -c:v libx264 -preset ultrafast -crf 28 proxy.mp4
Extract audio for transcription
ffmpeg -i raw.mp4 -vn -acodec pcm_s16le -ar 16000 audio.wav
Normalize audio levels
ffmpeg -i segment.mp4 -af loudnorm=I=-16:TP=-1.5:LRA=11 -c:v copy normalized.mp4
Scene detection
# Detect scene changes (threshold 0.3 = moderate sensitivity)
ffmpeg -i input.mp4 -vf "select='gt(scene,0.3)',showinfo" -vsync vfr -f null - 2>&1 | grep showinfo
Social media reframing
# 16:9 to 9:16 (center crop for TikTok/Reels)
ffmpeg -i input.mp4 -vf "crop=ih*9/16:ih,scale=1080:1920" vertical.mp4
# 16:9 to 1:1 (center crop for Instagram)
ffmpeg -i input.mp4 -vf "crop=ih:ih,scale=1080:1080" square.mp4
ElevenLabs Voice (Supplementary)
For professional voiceover outside the Remotion TTS pipeline, ElevenLabs is available via direct API. Use when you need high-fidelity English narration, voice cloning, or emotion control beyond what Gemini/Supertone provide. Requires ELEVENLABS_API_KEY.
import os, requests
resp = requests.post(
f"https://api.elevenlabs.io/v1/text-to-speech/<voice_id>",
headers={
"xi-api-key": os.environ["ELEVENLABS_API_KEY"],
"Content-Type": "application/json"
},
json={
"text": "Your narration text here",
"model_id": "eleven_turbo_v2_5",
"voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
}
)
with open("voiceover.mp3", "wb") as f:
f.write(resp.content)
Final Polish (Descript / CapCut)
For tasks that code-driven rendering handles poorly, use a traditional editor as the last mile:
- Pacing — adjust cuts that feel too fast or slow
- Captions — auto-generated, then manually cleaned
- Color grading — basic correction and mood
- Final audio mix — balance voice, music, and SFX levels
- Export — platform-specific formats and quality settings
Dependencies
The video pipeline requires Node.js, pnpm, FFmpeg, and a Chromium binary (auto-installed by Remotion).
node --version
npm install -g pnpm
brew install ffmpeg
remotion browser ensure
ensure-remotion.mjs bootstrap scriptEnvironment Variables
| Variable | Purpose | Required |
|---|---|---|
GEMINI_API_KEY | Gemini TTS provider | Yes (if using Gemini TTS, which is default) |
SUPERTONE_API_KEY | Supertone TTS provider | Only if using Supertone |
ELEVENLABS_API_KEY | ElevenLabs supplementary TTS | Only if using ElevenLabs |
GOOGLE_APPLICATION_CREDENTIALS | Vertex AI authentication | Alternative to gcloud auth |
Bootstrap
# First-time setup: installs packages and ensures Chromium is available
node scripts/ensure-remotion.mjs
# Verify setup
cd remotion-project && pnpm exec remotion studio
Project Structure
skills_ref/video/
+-- SKILL.md
+-- reference/ # visual-quality.md, tts-integration.md, components.md
+-- scripts/
| +-- pipeline.mjs # CLI entrypoint (sync/async/TTS)
| +-- tts.mjs # TTS orchestrator (multi-provider)
| +-- tts-providers/ # gemini.mjs, supertone.mjs, supertonic.mjs
| +-- ensure-remotion.mjs # Runtime bootstrap
| +-- validate-artifact.mjs # Gate 3: ffprobe validation
| +-- presets.mjs # Resolution presets (ESM)
+-- remotion-project/
+-- public/example-timeline.json
+-- src/
+-- components/ # 11 slide components + barrel
+-- timeline/ # JSON-to-React engine
"~해줌" Usage Examples
Real-world examples of how to invoke video tasks in natural language. Korean and English both work.
/tmp/remotion-render/TimelineVideo.mp4.narration and voiceControl fields to each element in the draft timeline. Runs the TTS pipeline with the default Gemini provider (voice: Kore), generates per-cut audio, and produces timeline.final.json with synced durations.ffmpeg -i input.mp4 -vf "crop=ih*9/16:ih,scale=1080:1920" vertical.mp4.validate-artifact.mjs to verify duration, codec, and resolution via ffprobe against the expected preset.Troubleshooting
Remotion Chromium not found
If remotion render fails with a browser error, ensure Chromium is installed:
node scripts/ensure-remotion.mjs
# Or manually:
cd remotion-project && pnpm exec remotion browser ensure
FFmpeg not found
Install FFmpeg and ffprobe:
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt-get install -y ffmpeg
# Verify
ffmpeg -version && ffprobe -version
TTS fails with auth error
Check that the appropriate environment variable is set for your chosen provider:
# Gemini (default)
echo $GEMINI_API_KEY
# Supertone
echo $SUPERTONE_API_KEY
# Supertonic requires no key (runs offline)
Render output is empty or corrupt
Run the artifact validator to diagnose:
node scripts/validate-artifact.mjs /tmp/remotion-render/TimelineVideo.mp4 --preset Landscape-1080p
If Gate 3 fails, check the Remotion logs for rendering errors. Common causes: missing fonts, invalid timeline JSON, or out-of-memory on large compositions.
Duration mismatch after TTS
The pipeline auto-corrects durationSec via ffprobe measurement after TTS generation. If durations still seem off, verify the narration text length matches the estimation formula: Math.ceil(text.length / 6.5) + 0.5 for Korean text.