Video Editing Toolkit

Goal

Complete video production pipeline: silence removal, auto-captions, vertical cropping, YouTube clipping, compression, and 3D transitions.

Scripts (7 total)

Script	Purpose
`jump_cut_vad_singlepass.py`	Remove silences with neural VAD (Silero)
`auto_captions.py`	Generate + burn styled subtitles (Whisper + FFmpeg)
`vertical_crop.py`	Auto-crop 16:9 → 9:16 with face tracking
`youtube_clip.py`	Download YouTube + AI chapter clipping
`compress_video.py`	Social media compression with platform presets
`simple_video_edit.py`	Full pipeline: silence removal + transcription + metadata + upload
`insert_3d_transition.py`	3D swivel teaser insertion

Quick Start Recipes

Recipe 1: Full Social Media Pipeline

# 1. Remove silences
python3 ./scripts/jump_cut_vad_singlepass.py input.mp4 .tmp/edited.mp4

# 2. Add captions
python3 ./scripts/auto_captions.py .tmp/edited.mp4 .tmp/captioned.mp4 --word-level --max-words 2

# 3. Crop to vertical for Shorts/Reels
python3 ./scripts/vertical_crop.py .tmp/captioned.mp4 .tmp/vertical.mp4

# 4. Compress for platform
python3 ./scripts/compress_video.py .tmp/vertical.mp4 output.mp4 --preset instagram-reel

Recipe 2: YouTube → Shorts Pipeline

# 1. Download + auto-clip YouTube video
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --auto-clip --output-dir clips/

# 2. Add captions to best clip
python3 ./scripts/auto_captions.py clips/chapters/01_intro.mp4 .tmp/captioned.mp4 --word-level

# 3. Crop to vertical
python3 ./scripts/vertical_crop.py .tmp/captioned.mp4 .tmp/vertical.mp4

# 4. Compress for YouTube Shorts
python3 ./scripts/compress_video.py .tmp/vertical.mp4 short.mp4 --preset youtube-shorts

Recipe 3: Quick Edit + Upload

# All-in-one: silence removal + transcription + metadata + Auphonic upload
python3 ./scripts/simple_video_edit.py --video input.mp4 --title "My Video"

Script 1: VAD Silence Removal

File: ./scripts/jump_cut_vad_singlepass.py

How It Works

Extracts audio as WAV (16kHz mono)
Runs Silero VAD to detect speech segments
Merges close segments, adds padding
Uses FFmpeg trim+concat to join segments in single pass
Hardware encodes with hevc_videotoolbox (H.265, 17Mbps, 30fps)

CLI Arguments

Argument	Default	Description
`--min-silence`	0.5	Min silence duration to cut (seconds)
`--min-speech`	0.25	Min speech duration to keep (seconds)
`--padding`	100	Padding around speech (ms)
`--merge-gap`	0.3	Merge segments closer than this (seconds)
`--keep-start`	true	Always start from 0:00

Usage

python3 ./scripts/jump_cut_vad_singlepass.py input.mp4 output.mp4
python3 ./scripts/jump_cut_vad_singlepass.py input.mp4 output.mp4 --min-silence 1.0 --padding 200

Processing Time

~8 minutes for a 49-min 4K video

Script 2: Auto-Captions

File: ./scripts/auto_captions.py

How It Works

Transcribes video using faster-whisper (word-level timestamps)
Generates SRT subtitles (segment-level or word-level)
Burns styled captions into video with FFmpeg

CLI Arguments

Argument	Default	Description
`--model`	base	Whisper model: tiny, base, small, medium, large-v3
`--language`	auto	Force language (en, es, hi, etc.)
`--word-level`	false	Short punchy captions (1-3 words at a time)
`--max-words`	3	Max words per caption in word-level mode
`--font-size`	22	Caption font size (auto-scales for vertical)
`--font-color`	white	Text color: white, yellow, cyan, green, red, orange, pink
`--outline-color`	black	Outline color
`--position`	bottom	Caption position: bottom, top, middle
`--srt-only`	false	Only generate SRT, don't burn
`--srt-path`	auto	Custom SRT output path
`--bold`	true	Bold text

Usage

# Basic captions
python3 ./scripts/auto_captions.py input.mp4 output.mp4

# Word-level (CapCut/Submagic style)
python3 ./scripts/auto_captions.py input.mp4 output.mp4 --word-level --max-words 2

# Yellow captions at top
python3 ./scripts/auto_captions.py input.mp4 output.mp4 --font-color yellow --position top

# SRT only (no burn)
python3 ./scripts/auto_captions.py input.mp4 --srt-only

# High accuracy
python3 ./scripts/auto_captions.py input.mp4 output.mp4 --model large-v3

Dependencies

pip install faster-whisper

Script 3: Vertical Crop

File: ./scripts/vertical_crop.py

How It Works

Samples frames throughout the video
Detects faces using OpenCV Haar cascade
Smooths face positions to avoid jitter
Crops 16:9 → 9:16 centered on the speaker
For moving subjects: segments video into 2s chunks with per-segment tracking

CLI Arguments

Argument	Default	Description
`--ratio`	9:16	Target aspect ratio (e.g., 9:16, 4:5, 1:1)
`--position`	auto	Crop position: auto (face tracking), left, center, right
`--smoothing`	30	Smoothing window in frames
`--sample-interval`	5	Sample every N frames for detection

Usage

# Auto face-tracking crop
python3 ./scripts/vertical_crop.py input.mp4 output.mp4

# Square crop (Instagram post)
python3 ./scripts/vertical_crop.py input.mp4 output.mp4 --ratio 1:1

# 4:5 crop (Instagram feed)
python3 ./scripts/vertical_crop.py input.mp4 output.mp4 --ratio 4:5

# Center crop (no tracking)
python3 ./scripts/vertical_crop.py input.mp4 output.mp4 --position center

Dependencies

pip install opencv-python numpy

Script 4: YouTube Clip

File: ./scripts/youtube_clip.py

How It Works

Downloads video via yt-dlp (up to 4K)
Extracts YouTube chapters if available
Falls back to AI chapter generation (Whisper + Claude)
Clips video into individual chapter files

CLI Arguments

Argument	Default	Description
`--download-only`	false	Only download, don't clip
`--max-quality`	1080	Max quality: 480, 720, 1080, 1440, 2160
`--audio-only`	false	Download audio only (MP3)
`--start`	none	Manual clip start (seconds)
`--end`	none	Manual clip end (seconds)
`--auto-clip`	false	AI-powered chapter detection and clipping
`--use-yt-chapters`	true	Use YouTube chapters if available
`--max-clips`	none	Limit number of clips
`--whisper-model`	base	Whisper model for transcription
`--reencode`	false	Re-encode clips (precise cuts)
`--output-dir`	.tmp/clips	Output directory

Usage

# Download only
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --download-only

# Auto-clip into chapters
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --auto-clip

# Extract specific range
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --start 60 --end 180 -o clip.mp4

# Download audio only
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --audio-only

# Top 3 clips only
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --auto-clip --max-clips 3

Dependencies

pip install yt-dlp faster-whisper anthropic

Script 5: Video Compressor

File: ./scripts/compress_video.py

Platform Presets

Preset	Resolution	CRF	Max Duration	Max Size	Notes
`youtube`	1080p	18	none	none	High quality
`youtube-shorts`	1080x1920	20	60s	none	Vertical
`instagram-reel`	1080x1920	23	90s	none	Vertical
`instagram-post`	1080x1080	23	60s	none	Square/vertical
`tiktok`	1080x1920	23	180s	none	Vertical
`twitter`	1080p	23	140s	512MB	Auto-bitrate
`linkedin`	1080p	23	600s	200MB	Auto-bitrate
`telegram`	720p	28	none	50MB	For bots
`whatsapp`	720p	28	120s	16MB	Aggressive
`small`	480p	30	none	none	Quick sharing

CLI Arguments

Argument	Default	Description
`--preset`	none	Platform preset (see table above)
`--resolution`	1080	Max height in pixels
`--crf`	23	Quality (0=lossless, 51=worst)
`--audio-bitrate`	128k	Audio bitrate
`--target-size`	none	Target file size in MB
`--max-duration`	none	Max duration in seconds
`--list-presets`	false	Show all presets

Usage

# Platform preset
python3 ./scripts/compress_video.py input.mp4 output.mp4 --preset instagram-reel
python3 ./scripts/compress_video.py input.mp4 output.mp4 --preset whatsapp

# Target file size
python3 ./scripts/compress_video.py input.mp4 output.mp4 --target-size 25

# Custom
python3 ./scripts/compress_video.py input.mp4 output.mp4 --resolution 720 --crf 28

# List all presets
python3 ./scripts/compress_video.py input.mp4 output.mp4 --list-presets

Script 6: Simple Video Edit (Full Pipeline)

File: ./scripts/simple_video_edit.py

How It Works

FFmpeg silence detection + cutting
Audio normalization (loudnorm)
Whisper transcription
Claude-generated YouTube metadata (summary + chapters)
Auphonic upload → YouTube (private draft)

CLI Arguments

Argument	Default	Description
`--video`	required	Input video path
`--title`	required	YouTube title
`--thumbnail`	none	Thumbnail image path
`--no-upload`	false	Skip Auphonic upload
`--no-normalize`	false	Skip audio normalization
`--upload-only`	false	Skip editing, just upload
`--silence-threshold`	-35	Silence threshold in dB
`--silence-duration`	3.0	Min silence duration (seconds)

Usage

# Full pipeline
python3 ./scripts/simple_video_edit.py --video input.mp4 --title "My Video"

# Local only
python3 ./scripts/simple_video_edit.py --video input.mp4 --title "Test" --no-upload

Dependencies

pip install anthropic faster-whisper requests python-dotenv
# Requires: ANTHROPIC_API_KEY, AUPHONIC_API_KEY in .env

Script 7: 3D Swivel Teaser

File: ./scripts/insert_3d_transition.py

How It Works

Extracts frames from later in video (default: 60s onwards)
Creates 3D rotating "swivel" animation via Remotion
Splits video: intro → transition → main content
Re-encodes and concatenates with audio preserved

CLI Arguments

Argument	Default	Description
`--insert-at`	3	Where to insert teaser (seconds)
`--duration`	5	Teaser duration (seconds)
`--teaser-start`	60	Where to sample content from (seconds)
`--bg-color`	#2d3436	Background color (hex)
`--bg-image`	none	Background image path

Final Timeline

[0-3s intro] [3-8s swivel teaser @ 100x] [8s onwards: edited content]
Audio: Original audio plays continuously

Dependencies

pip install torch  # For Silero VAD (jump_cut script)
brew install ffmpeg node  # macOS
cd video_effects && npm install  # For Remotion 3D rendering

All Dependencies (Install Once)

# Core (required for all scripts)
brew install ffmpeg

# Silence removal
pip install torch

# Auto-captions + YouTube clip
pip install faster-whisper

# Vertical crop
pip install opencv-python numpy

# YouTube download
pip install yt-dlp

# AI features (chapters, metadata)
pip install anthropic

# Full pipeline (simple_video_edit)
pip install requests python-dotenv

# 3D transitions
brew install node
cd scripts/video_effects && npm install

Troubleshooting

Issue	Solution
Cuts feel abrupt	`--padding 200` in jump_cut
Too much cut	`--min-silence 1.0` in jump_cut
Captions too small	`--font-size 32` in auto_captions
Vertical video detected	Font auto-scales 1.3x
Won't play in QuickTime	Ensure hvc1 codec tag
Face not detected	Try `--position center` in vertical_crop
YouTube download fails	Update yt-dlp: `pip install -U yt-dlp`
File too large	Use `--preset whatsapp` or `--target-size 25`
Hardware encoder fails	Auto-falls back to software (libx264)

Technical Details

macOS: Hardware encoding (hevc_videotoolbox / h264_videotoolbox)
Fallback: libx264/libx265 with CRF
Audio: AAC 128-192kbps
Uses hvc1 codec tag for QuickTime compatibility
All scripts support --help for full argument list

Schema

Inputs

Name	Type	Required	Description
`input_video`	file_path	Yes	Input video file path
`recipe`	string	No	Pipeline recipe: full-social, youtube-shorts, quick-edit
`preset`	string	No	Compression preset: youtube, instagram-reel, tiktok, whatsapp, etc.

Outputs

Name	Type	Description
`output_video`	file_path	Processed video file path

Credentials

Name	Source
`ANTHROPIC_API_KEY`	.env (for AI chapters)
`AUPHONIC_API_KEY`	.env (for upload)

Composable With

Skills that chain well with this one: pan-3d-transition, recreate-thumbnails

Cost

Free locally (FFmpeg + Silero VAD)

video-edit

Safety Notice

Copy this and send it to your AI assistant to learn

Video Editing Toolkit

Goal

Scripts (7 total)

Quick Start Recipes

Recipe 1: Full Social Media Pipeline

Recipe 2: YouTube → Shorts Pipeline

Recipe 3: Quick Edit + Upload

Script 1: VAD Silence Removal

How It Works

CLI Arguments

Usage

Processing Time

Script 2: Auto-Captions

How It Works

CLI Arguments

Usage

Dependencies

Script 3: Vertical Crop

How It Works

CLI Arguments

Usage

Dependencies

Script 4: YouTube Clip

How It Works

CLI Arguments

Usage

Dependencies

Script 5: Video Compressor

Platform Presets

CLI Arguments

Usage

Script 6: Simple Video Edit (Full Pipeline)

How It Works

CLI Arguments

Usage

Dependencies

Script 7: 3D Swivel Teaser

How It Works

CLI Arguments

Final Timeline

Dependencies

All Dependencies (Install Once)

Troubleshooting

Technical Details

Schema

Inputs

Outputs

Credentials

Composable With

Cost

Source Transparency

Related Skills

image-to-video

excalidraw-visuals

gmaps-leads

whisper-voice