Video Producer

Create complete videos with voiceover, music, and visuals.

This is an orchestrator skill that combines:

Script/storyboard generation (Claude)
Voiceover synthesis (Gemini TTS)
Background music (Lyria)
Video clip generation (Veo 3.1) or image animation
Final assembly (FFmpeg via media-utils)

Workflow

Step 1: Gather Requirements (REQUIRED)

⚠️ DO NOT skip this step. DO NOT run init_project.py until you have ALL answers.

Use interactive questioning — ask ONE question at a time, wait for the response, then ask the next. This creates a collaborative spec-driven process.

Question Flow

⚠️ Use the AskUserQuestion tool for each question below. Do not just print questions in your response — use the tool to create interactive prompts with the options shown.

Q1: Subject

"I'll create that video! First — what's it about?

(e.g., product launch, brand story, tutorial, explainer — or describe your own)"

Wait for response.

Q2: Duration

"How long should the video be?

15 seconds (quick hook)

30 seconds (standard ad)

60 seconds (explainer)

2+ minutes (detailed)

Or specify your own duration"

Wait for response.

Q3: Style

"What visual style?

Premium/luxury

Fun/playful

Corporate/professional

Dramatic/cinematic

Minimal/clean

Or describe your own style"

Wait for response.

Q4: Assets

"Do you have existing images or video clips to use?

No, generate everything

Yes, I have images (provide paths)

Yes, I have video clips (provide paths)"

Wait for response.

Q5: Audio Strategy

"How should we handle audio?

Custom — I generate voiceover + background music

Veo native — Use Veo's built-in dialogue/SFX/ambient

Silent — No audio, add later"

Wait for response.

Q6: Voice (if custom audio)

"What voice tone for the voiceover?

Professional

Friendly/warm

Energetic

Calm/soothing

Dramatic

Or describe your own tone"

Wait for response.

Q7: Music (if custom audio)

"What music vibe?

Modern electronic

Cinematic/epic

Upbeat pop

Ambient/chill

Corporate

Or describe your own style"

Wait for response.

Q8: Format

"What aspect ratio?

16:9 (YouTube, web)

9:16 (TikTok, Reels, Shorts)

1:1 (Instagram feed)"

Wait for response.

Q9: Resolution

"What resolution?

720p (faster generation)

1080p (standard HD)"

Wait for response.

Q10: Model

"Which Veo model?

veo-3.1 — Latest, highest quality (default)

veo-3.1-fast — Faster generation, slightly lower quality

veo-3 — Previous generation

veo-3-fast — Previous gen, faster"

Wait for response.

Quick Reference

Question	Determines
Subject	Scene content and prompts
Duration	Scene count (Veo clips must be 4, 6, or 8 seconds)
Style	Visual prompts and music selection
Assets	Generate vs use existing
Audio	custom, veo_audio, or silent
Voice	TTS voice selection
Music	Lyria prompt
Format	Aspect ratio for Veo
Resolution	720p or 1080p output quality
Model	veo-3.1, veo-3.1-fast, veo-3, veo-3-fast

Step 2: Initialize Project

Once you have the user's answers, initialize the project with their preferences:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-producer-agent/scripts/init_project.py \
  --name "Product Launch Video" \
  --duration 30 \
  --aspect-ratio 16:9 \
  --audio-strategy custom \
  --scenes 5

Step 3: Configure project.json

Edit project.json with scene prompts, voiceover text, and music style based on user's answers.

Step 4: Assemble the Video

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-producer-agent/scripts/assemble.py \
  --project ~/my_video_project/

Project Structure

When you initialize a project, this folder structure is created:

my_project/
├── project.json          # Configuration: scenes, voiceover, music, settings
├── storyboard.md         # Planning document for the video
├── scenes/               # Generated video clips from Veo
│   ├── scene1_intro.mp4
│   ├── scene2_features.mp4
│   └── scene3_cta.mp4
├── audio/                # Audio assets
│   ├── voiceover.wav     # Generated voiceover
│   ├── background_music.wav  # Generated music
│   └── final_mix.mp3     # Mixed audio track
├── work/                 # Intermediate files (auto-generated)
│   ├── silent_scene1.mp4
│   ├── video_concatenated.mp4
│   └── ...
└── output/               # Final deliverables
    └── product_launch_video_final.mp4

Scripts

init_project.py

Initialize a new video project with folder structure and templates.

# Basic project
python3 init_project.py --name "My Video" --duration 30

# Create in specific directory
python3 init_project.py --name "Demo Video" --output ~/Videos/

# Vertical video for social
python3 init_project.py --name "Instagram Reel" --aspect-ratio 9:16 --duration 15

# Use Veo's native audio (no custom voiceover/music)
python3 init_project.py --name "Cinematic Scene" --audio-strategy veo_audio

# More scenes
python3 init_project.py --name "Long Video" --duration 60 --scenes 5

Options:

Option	Default	Description
`--name`	required	Project name
`--output`	current dir	Parent directory
`--duration`	30	Target duration in seconds
`--aspect-ratio`	16:9	16:9, 9:16, 1:1, 4:3
`--audio-strategy`	custom	custom, veo_audio, silent
`--scenes`	3	Number of scene placeholders

assemble.py

Orchestrate the full video assembly pipeline.

# Full pipeline (generate everything + assemble)
python3 assemble.py --project ~/my_project/

# Skip generation (use existing scene/audio files)
python3 assemble.py --project ~/my_project/ --skip-generation

# Dry run (show what would be done)
python3 assemble.py --project ~/my_project/ --dry-run

Pipeline steps:

Generate video scenes (Veo 3.1)
Strip audio from scenes (if custom audio)
Generate voiceover (Gemini TTS)
Generate background music (Lyria)
Mix voiceover + music
Concatenate video clips
Merge audio with video
Output final video

project.json Configuration

{
  "name": "Product Launch Video",
  "duration_target": 30,
  "aspect_ratio": "16:9",
  "resolution": "720p",
  "audio_strategy": "custom",
  
  "scenes": [
    {
      "id": 1,
      "name": "scene1_hero",
      "prompt": "Cinematic slow zoom on premium product, dramatic lighting, high-end commercial style",
      "duration": 6,
      "notes": "Music only, no voiceover"
    },
    {
      "id": 2,
      "name": "scene2_features",
      "prompt": "Product features demonstration, sleek animations, modern tech aesthetic",
      "duration": 8,
      "notes": "Voiceover starts here"
    },
    {
      "id": 3,
      "name": "scene3_cta",
      "prompt": "Product with logo on clean background, call to action moment",
      "duration": 6,
      "notes": "Music swells, voiceover ends"
    }
  ],
  
  "voiceover": {
    "enabled": true,
    "text": "Introducing the future of audio. Crystal clear sound. All-day comfort. Experience the difference.",
    "voice": "Charon",
    "style": "Professional, confident, premium brand voice"
  },
  
  "music": {
    "enabled": true,
    "prompt": "modern electronic, premium, sleek, product showcase, subtle bass",
    "duration": 35,
    "bpm": 100,
    "brightness": 0.6
  },
  
  "assembly": {
    "transition": "fade",
    "transition_duration": 0.5,
    "music_volume": 0.3,
    "fade_in": 1.0,
    "fade_out": 2.0
  }
}

Audio Strategies

Strategy	Description	Use When
`custom`	Strip Veo audio, add custom voiceover + music	Most videos
`veo_audio`	Keep Veo's generated audio (dialogue, SFX)	Cinematic scenes, dialogues
`silent`	Strip audio, output silent video	Adding audio later

Workflow: Creating a Video

Step 1: Initialize Project

python3 init_project.py --name "Wireless Earbuds Promo" --duration 30

Step 2: Plan the Storyboard

Edit storyboard.md to plan your video structure:

## Scene 1: Hero Reveal (0-5s)
- Visual: Earbuds emerging from shadow, premium lighting
- Audio: Music only (dramatic intro)

## Scene 2: Sound Quality (5-12s)
- Visual: Sound waves, person enjoying music
- Audio: Voiceover: "Crystal clear sound. Immersive bass."

## Scene 3: Comfort (12-20s)
- Visual: Close-up of fit, person running
- Audio: Voiceover: "All-day comfort. Secure fit."

## Scene 4: CTA (20-30s)
- Visual: Product + logo
- Audio: Voiceover: "Experience the difference." + music swell

Step 3: Configure project.json

Fill in the scene prompts, voiceover text, and music style based on your storyboard.

Step 4: Assemble

python3 assemble.py --project ~/wireless_earbuds_promo/

Step 5: Review and Iterate

Check the output in output/. If adjustments needed:

# Re-run with existing scenes (just re-mix audio)
python3 assemble.py --project ~/wireless_earbuds_promo/ --skip-generation

What You Can Create

Type	Example
Product video	30s hero video showcasing a product
Explainer video	How-to or feature explanation
Promo/ad video	Marketing advertisement
Demo video	Product demonstration
Training video	Internal training content
Testimonial	Customer quote style video
Brand video	Company/brand story

Prerequisites

GOOGLE_API_KEY - For Veo (video), Gemini TTS (voice), Lyria (music)
FFmpeg installed: brew install ffmpeg

Video Styles & Music Pairings

Style	Music Prompt	Voice
Premium/Luxury	"elegant, minimal, ambient, sophisticated"	Charon (informative)
Tech/Modern	"electronic, futuristic, clean, innovative"	Kore (firm)
Fun/Playful	"upbeat, cheerful, acoustic, positive"	Puck (upbeat)
Corporate	"professional, inspiring, orchestral lite"	Orus (firm)
Lifestyle	"chill, aspirational, indie, warm"	Aoede (breezy)
Dramatic/Cinematic	"epic, orchestral, emotional, building"	Gacrux (mature)

Common Video Structures

Product Video (30s)

0-5s:   Hero shot (music only)
5-15s:  Features (voiceover + music)
15-25s: Lifestyle/use case (voiceover + music)
25-30s: Logo + CTA (music fade)

Explainer Video (60s)

0-5s:   Hook/problem statement
5-20s:  Solution introduction
20-45s: How it works (3 steps)
45-55s: Benefits summary
55-60s: CTA

Testimonial Video (45s)

0-5s:   Intro/name card
5-35s:  Testimonial quote (multiple scenes)
35-45s: Product shot + logo

Manual Workflow (Without Scripts)

If you prefer to run each step manually:

Generate scenes:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --batch scenes.json

Generate voiceover:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Your voiceover text..." \
  --voice Charon \
  --style "Professional, warm"

Generate music:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "modern electronic, premium" \
  --duration 35

Strip audio from clips:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/video_strip_audio.py \
  -i scene*.mp4

Concatenate videos:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/video_concat.py \
  -i silent_*.mp4 --transition fade -o video.mp4

Mix audio:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice voiceover.wav --music music.wav -o audio.mp3

Merge audio + video:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/video_audio_merge.py \
  --video video.mp4 --audio audio.mp3 -o final.mp4

Input Files You Can Provide

File Type	How It's Used
Product images	Animate with Veo as first frame
Logo (PNG)	Overlay on final scene
Existing voiceover	Place in `audio/voiceover.wav`
Brand music	Place in `audio/background_music.wav`
Video clips	Place in `scenes/` folder
Script/copy	Use for voiceover text

Limitations

Veo video duration: Max 8 seconds per clip (concatenate for longer)
Veo 3.1 includes audio: All clips have generated audio (strip if using custom)
Processing time: Video generation takes 1-3 minutes per clip
Resolution: Currently 720p or 1080p (1080p for 8s only)

Error Handling

Error	Solution
"GOOGLE_API_KEY not set"	Set up API key per README
"FFmpeg not found"	Install: `brew install ffmpeg`
"project.json not found"	Run init_project.py first
Video generation timeout	Retry, or use shorter duration
Audio/video sync issues	Adjust scene durations

Example Prompts

Simple:

"Create a 30-second product video for my new coffee maker"

Detailed:

"Create a 45-second product video for our new wireless earbuds. Premium, luxury feel. I have product photos attached. Professional male voiceover. Modern electronic music. 16:9 for YouTube."

With project:

"Initialize a video project called 'App Demo' with 5 scenes, 60 seconds total, vertical format for TikTok"

video-producer-agent

Safety Notice

Copy this and send it to your AI assistant to learn

Video Producer

Workflow

Step 1: Gather Requirements (REQUIRED)

Question Flow

Quick Reference

Step 2: Initialize Project

Step 3: Configure project.json

Step 4: Assemble the Video

Project Structure

Scripts

init_project.py

assemble.py

project.json Configuration

Audio Strategies

Workflow: Creating a Video

Step 1: Initialize Project

Step 2: Plan the Storyboard

Step 3: Configure project.json

Step 4: Assemble

Step 5: Review and Iterate

What You Can Create

Prerequisites

Video Styles & Music Pairings

Common Video Structures

Product Video (30s)

Explainer Video (60s)

Testimonial Video (45s)

Manual Workflow (Without Scripts)

Input Files You Can Provide

Limitations

Error Handling

Example Prompts

Source Transparency

Related Skills

add-to-xcode

patent-lawyer-agent

audio-producer-agent

review-analyst-agent