YouTube Transcript Extraction
Extract subtitles and transcripts from YouTube videos.
Methods
Method Tool When to Use
CLI yt-dlp Fast, reliable, preferred
Browser Chrome automation Fallback when CLI fails
API youtube-transcript-api Python programmatic access
yt-dlp Method (Preferred)
Basic Command
yt-dlp --write-auto-sub --write-sub --sub-lang en --skip-download -o "%(title)s.%(ext)s" "VIDEO_URL"
Key Flags
Flag Purpose
--write-sub
Download manual subtitles
--write-auto-sub
Download auto-generated subtitles
--sub-lang LANG
Specify language (en, zh-Hans, etc.)
--skip-download
Don't download video
--cookies-from-browser chrome
Use browser cookies for restricted videos
Common Issues
Issue Solution
Sign-in required Add --cookies-from-browser chrome
No subtitles found Video has no captions available
Age-restricted Use cookies from logged-in browser
Browser Automation Fallback
When CLI fails, use browser automation:
-
Open video page - Navigate to YouTube URL
-
Expand description - Click "...more" button
-
Open transcript - Click "Show transcript" button
-
Extract text - Query DOM for transcript segments
DOM Selectors
Element Selector
Transcript segments ytd-transcript-segment-renderer
Timestamp .segment-timestamp
Text .segment-text
Output Formats
Format Extension Use Case
VTT .vtt Web standard, includes timing
SRT .srt Video editing, media players
TXT .txt Plain text, no timing
Convert VTT to Plain Text
Strip timing and formatting
sed '/^[0-9]/d; /^$/d; /WEBVTT/d; /-->/d' video.vtt > video.txt
Language Codes
Language Code
English en
Chinese (Simplified) zh-Hans
Chinese (Traditional) zh-Hant
Spanish es
Multiple en,es,zh-Hans
Best Practices
Practice Why
Try manual subs first Higher quality than auto-generated
Use cookies for restricted Avoids sign-in errors
Check multiple languages Some videos have better subs in other languages
Verify transcript exists Not all videos have captions