video-edit

Complete video editing toolkit - silence removal, auto-captions, vertical crop, YouTube clipping, 3D transitions, and social media compression. Use when user asks to edit video, remove silences, add captions/subtitles, crop to vertical/shorts, download YouTube clips, compress video, or create video teasers.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "video-edit" with this command: npx skills add aiagentwithdhruv/skills/aiagentwithdhruv-skills-video-edit

Video Editing Toolkit

Goal

Complete video production pipeline: silence removal, auto-captions, vertical cropping, YouTube clipping, compression, and 3D transitions.

Scripts (7 total)

ScriptPurpose
jump_cut_vad_singlepass.pyRemove silences with neural VAD (Silero)
auto_captions.pyGenerate + burn styled subtitles (Whisper + FFmpeg)
vertical_crop.pyAuto-crop 16:9 → 9:16 with face tracking
youtube_clip.pyDownload YouTube + AI chapter clipping
compress_video.pySocial media compression with platform presets
simple_video_edit.pyFull pipeline: silence removal + transcription + metadata + upload
insert_3d_transition.py3D swivel teaser insertion

Quick Start Recipes

Recipe 1: Full Social Media Pipeline

# 1. Remove silences
python3 ./scripts/jump_cut_vad_singlepass.py input.mp4 .tmp/edited.mp4

# 2. Add captions
python3 ./scripts/auto_captions.py .tmp/edited.mp4 .tmp/captioned.mp4 --word-level --max-words 2

# 3. Crop to vertical for Shorts/Reels
python3 ./scripts/vertical_crop.py .tmp/captioned.mp4 .tmp/vertical.mp4

# 4. Compress for platform
python3 ./scripts/compress_video.py .tmp/vertical.mp4 output.mp4 --preset instagram-reel

Recipe 2: YouTube → Shorts Pipeline

# 1. Download + auto-clip YouTube video
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --auto-clip --output-dir clips/

# 2. Add captions to best clip
python3 ./scripts/auto_captions.py clips/chapters/01_intro.mp4 .tmp/captioned.mp4 --word-level

# 3. Crop to vertical
python3 ./scripts/vertical_crop.py .tmp/captioned.mp4 .tmp/vertical.mp4

# 4. Compress for YouTube Shorts
python3 ./scripts/compress_video.py .tmp/vertical.mp4 short.mp4 --preset youtube-shorts

Recipe 3: Quick Edit + Upload

# All-in-one: silence removal + transcription + metadata + Auphonic upload
python3 ./scripts/simple_video_edit.py --video input.mp4 --title "My Video"

Script 1: VAD Silence Removal

File: ./scripts/jump_cut_vad_singlepass.py

How It Works

  1. Extracts audio as WAV (16kHz mono)
  2. Runs Silero VAD to detect speech segments
  3. Merges close segments, adds padding
  4. Uses FFmpeg trim+concat to join segments in single pass
  5. Hardware encodes with hevc_videotoolbox (H.265, 17Mbps, 30fps)

CLI Arguments

ArgumentDefaultDescription
--min-silence0.5Min silence duration to cut (seconds)
--min-speech0.25Min speech duration to keep (seconds)
--padding100Padding around speech (ms)
--merge-gap0.3Merge segments closer than this (seconds)
--keep-starttrueAlways start from 0:00

Usage

python3 ./scripts/jump_cut_vad_singlepass.py input.mp4 output.mp4
python3 ./scripts/jump_cut_vad_singlepass.py input.mp4 output.mp4 --min-silence 1.0 --padding 200

Processing Time

~8 minutes for a 49-min 4K video


Script 2: Auto-Captions

File: ./scripts/auto_captions.py

How It Works

  1. Transcribes video using faster-whisper (word-level timestamps)
  2. Generates SRT subtitles (segment-level or word-level)
  3. Burns styled captions into video with FFmpeg

CLI Arguments

ArgumentDefaultDescription
--modelbaseWhisper model: tiny, base, small, medium, large-v3
--languageautoForce language (en, es, hi, etc.)
--word-levelfalseShort punchy captions (1-3 words at a time)
--max-words3Max words per caption in word-level mode
--font-size22Caption font size (auto-scales for vertical)
--font-colorwhiteText color: white, yellow, cyan, green, red, orange, pink
--outline-colorblackOutline color
--positionbottomCaption position: bottom, top, middle
--srt-onlyfalseOnly generate SRT, don't burn
--srt-pathautoCustom SRT output path
--boldtrueBold text

Usage

# Basic captions
python3 ./scripts/auto_captions.py input.mp4 output.mp4

# Word-level (CapCut/Submagic style)
python3 ./scripts/auto_captions.py input.mp4 output.mp4 --word-level --max-words 2

# Yellow captions at top
python3 ./scripts/auto_captions.py input.mp4 output.mp4 --font-color yellow --position top

# SRT only (no burn)
python3 ./scripts/auto_captions.py input.mp4 --srt-only

# High accuracy
python3 ./scripts/auto_captions.py input.mp4 output.mp4 --model large-v3

Dependencies

pip install faster-whisper

Script 3: Vertical Crop

File: ./scripts/vertical_crop.py

How It Works

  1. Samples frames throughout the video
  2. Detects faces using OpenCV Haar cascade
  3. Smooths face positions to avoid jitter
  4. Crops 16:9 → 9:16 centered on the speaker
  5. For moving subjects: segments video into 2s chunks with per-segment tracking

CLI Arguments

ArgumentDefaultDescription
--ratio9:16Target aspect ratio (e.g., 9:16, 4:5, 1:1)
--positionautoCrop position: auto (face tracking), left, center, right
--smoothing30Smoothing window in frames
--sample-interval5Sample every N frames for detection

Usage

# Auto face-tracking crop
python3 ./scripts/vertical_crop.py input.mp4 output.mp4

# Square crop (Instagram post)
python3 ./scripts/vertical_crop.py input.mp4 output.mp4 --ratio 1:1

# 4:5 crop (Instagram feed)
python3 ./scripts/vertical_crop.py input.mp4 output.mp4 --ratio 4:5

# Center crop (no tracking)
python3 ./scripts/vertical_crop.py input.mp4 output.mp4 --position center

Dependencies

pip install opencv-python numpy

Script 4: YouTube Clip

File: ./scripts/youtube_clip.py

How It Works

  1. Downloads video via yt-dlp (up to 4K)
  2. Extracts YouTube chapters if available
  3. Falls back to AI chapter generation (Whisper + Claude)
  4. Clips video into individual chapter files

CLI Arguments

ArgumentDefaultDescription
--download-onlyfalseOnly download, don't clip
--max-quality1080Max quality: 480, 720, 1080, 1440, 2160
--audio-onlyfalseDownload audio only (MP3)
--startnoneManual clip start (seconds)
--endnoneManual clip end (seconds)
--auto-clipfalseAI-powered chapter detection and clipping
--use-yt-chapterstrueUse YouTube chapters if available
--max-clipsnoneLimit number of clips
--whisper-modelbaseWhisper model for transcription
--reencodefalseRe-encode clips (precise cuts)
--output-dir.tmp/clipsOutput directory

Usage

# Download only
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --download-only

# Auto-clip into chapters
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --auto-clip

# Extract specific range
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --start 60 --end 180 -o clip.mp4

# Download audio only
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --audio-only

# Top 3 clips only
python3 ./scripts/youtube_clip.py "https://youtube.com/watch?v=..." --auto-clip --max-clips 3

Dependencies

pip install yt-dlp faster-whisper anthropic

Script 5: Video Compressor

File: ./scripts/compress_video.py

Platform Presets

PresetResolutionCRFMax DurationMax SizeNotes
youtube1080p18nonenoneHigh quality
youtube-shorts1080x19202060snoneVertical
instagram-reel1080x19202390snoneVertical
instagram-post1080x10802360snoneSquare/vertical
tiktok1080x192023180snoneVertical
twitter1080p23140s512MBAuto-bitrate
linkedin1080p23600s200MBAuto-bitrate
telegram720p28none50MBFor bots
whatsapp720p28120s16MBAggressive
small480p30nonenoneQuick sharing

CLI Arguments

ArgumentDefaultDescription
--presetnonePlatform preset (see table above)
--resolution1080Max height in pixels
--crf23Quality (0=lossless, 51=worst)
--audio-bitrate128kAudio bitrate
--target-sizenoneTarget file size in MB
--max-durationnoneMax duration in seconds
--list-presetsfalseShow all presets

Usage

# Platform preset
python3 ./scripts/compress_video.py input.mp4 output.mp4 --preset instagram-reel
python3 ./scripts/compress_video.py input.mp4 output.mp4 --preset whatsapp

# Target file size
python3 ./scripts/compress_video.py input.mp4 output.mp4 --target-size 25

# Custom
python3 ./scripts/compress_video.py input.mp4 output.mp4 --resolution 720 --crf 28

# List all presets
python3 ./scripts/compress_video.py input.mp4 output.mp4 --list-presets

Script 6: Simple Video Edit (Full Pipeline)

File: ./scripts/simple_video_edit.py

How It Works

  1. FFmpeg silence detection + cutting
  2. Audio normalization (loudnorm)
  3. Whisper transcription
  4. Claude-generated YouTube metadata (summary + chapters)
  5. Auphonic upload → YouTube (private draft)

CLI Arguments

ArgumentDefaultDescription
--videorequiredInput video path
--titlerequiredYouTube title
--thumbnailnoneThumbnail image path
--no-uploadfalseSkip Auphonic upload
--no-normalizefalseSkip audio normalization
--upload-onlyfalseSkip editing, just upload
--silence-threshold-35Silence threshold in dB
--silence-duration3.0Min silence duration (seconds)

Usage

# Full pipeline
python3 ./scripts/simple_video_edit.py --video input.mp4 --title "My Video"

# Local only
python3 ./scripts/simple_video_edit.py --video input.mp4 --title "Test" --no-upload

Dependencies

pip install anthropic faster-whisper requests python-dotenv
# Requires: ANTHROPIC_API_KEY, AUPHONIC_API_KEY in .env

Script 7: 3D Swivel Teaser

File: ./scripts/insert_3d_transition.py

How It Works

  1. Extracts frames from later in video (default: 60s onwards)
  2. Creates 3D rotating "swivel" animation via Remotion
  3. Splits video: intro → transition → main content
  4. Re-encodes and concatenates with audio preserved

CLI Arguments

ArgumentDefaultDescription
--insert-at3Where to insert teaser (seconds)
--duration5Teaser duration (seconds)
--teaser-start60Where to sample content from (seconds)
--bg-color#2d3436Background color (hex)
--bg-imagenoneBackground image path

Final Timeline

[0-3s intro] [3-8s swivel teaser @ 100x] [8s onwards: edited content]
Audio: Original audio plays continuously

Dependencies

pip install torch  # For Silero VAD (jump_cut script)
brew install ffmpeg node  # macOS
cd video_effects && npm install  # For Remotion 3D rendering

All Dependencies (Install Once)

# Core (required for all scripts)
brew install ffmpeg

# Silence removal
pip install torch

# Auto-captions + YouTube clip
pip install faster-whisper

# Vertical crop
pip install opencv-python numpy

# YouTube download
pip install yt-dlp

# AI features (chapters, metadata)
pip install anthropic

# Full pipeline (simple_video_edit)
pip install requests python-dotenv

# 3D transitions
brew install node
cd scripts/video_effects && npm install

Troubleshooting

IssueSolution
Cuts feel abrupt--padding 200 in jump_cut
Too much cut--min-silence 1.0 in jump_cut
Captions too small--font-size 32 in auto_captions
Vertical video detectedFont auto-scales 1.3x
Won't play in QuickTimeEnsure hvc1 codec tag
Face not detectedTry --position center in vertical_crop
YouTube download failsUpdate yt-dlp: pip install -U yt-dlp
File too largeUse --preset whatsapp or --target-size 25
Hardware encoder failsAuto-falls back to software (libx264)

Technical Details

  • macOS: Hardware encoding (hevc_videotoolbox / h264_videotoolbox)
  • Fallback: libx264/libx265 with CRF
  • Audio: AAC 128-192kbps
  • Uses hvc1 codec tag for QuickTime compatibility
  • All scripts support --help for full argument list

Schema

Inputs

NameTypeRequiredDescription
input_videofile_pathYesInput video file path
recipestringNoPipeline recipe: full-social, youtube-shorts, quick-edit
presetstringNoCompression preset: youtube, instagram-reel, tiktok, whatsapp, etc.

Outputs

NameTypeDescription
output_videofile_pathProcessed video file path

Credentials

NameSource
ANTHROPIC_API_KEY.env (for AI chapters)
AUPHONIC_API_KEY.env (for upload)

Composable With

Skills that chain well with this one: pan-3d-transition, recreate-thumbnails

Cost

Free locally (FFmpeg + Silero VAD)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

image-to-video

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

excalidraw-visuals

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

gmaps-leads

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

whisper-voice

No summary provided by upstream source.

Repository SourceNeeds Review