YouTube Transcript Skill
Production-grade YouTube transcript extraction with comprehensive format support, intelligent caching, and resilient networking.
When to Use
✅ USE this skill when:
-
Extracting transcripts from YouTube videos
-
Converting YouTube captions to SRT/VTT subtitle files
-
Analyzing video content via transcripts
-
Creating subtitles for downloaded videos
-
Batch processing multiple video transcripts
-
Needing transcripts in specific languages
-
Processing auto-generated captions
❌ DON'T use this skill when:
-
Transcript not available (disabled by creator)
-
Video is private or age-restricted
-
Livestream that hasn't ended
-
Need speech-to-text from audio → Use transcribe
-
Need video frames → Use video-frames
Prerequisites
Requires Node.js (already available)
node --version
No additional dependencies required
Commands
Basic Usage
Extract transcript with video ID
{baseDir}/youtube-transcript.js VIDEO_ID
Extract with full URL
{baseDir}/youtube-transcript.js "https://www.youtube.com/watch?v=VIDEO_ID"
Extract with short URL
{baseDir}/youtube-transcript.js "https://youtu.be/VIDEO_ID"
Output Formats
Plain text with timestamps (default)
{baseDir}/youtube-transcript.js VIDEO_ID --format text [0:00:00.00] Here is the transcript text [0:00:05.32] More transcript content
Plain text without timestamps
{baseDir}/youtube-transcript.js VIDEO_ID --format plain Here is the transcript text More transcript content
JSON with metadata
{baseDir}/youtube-transcript.js VIDEO_ID --format json { "title": "Video Title", "author": "Channel Name", "language": "en", "isAutoGenerated": false, "transcript": [...] }
SRT subtitle format
{baseDir}/youtube-transcript.js VIDEO_ID --format srt > video.srt 1 00:00:00,000 --> 00:00:05,320 Here is the transcript text
2 00:00:05,320 --> 00:00:08,150 More transcript content
VTT subtitle format
{baseDir}/youtube-transcript.js VIDEO_ID --format vtt > video.vtt WEBVTT
1 00:00.000 --> 00:05.320 Here is the transcript text
TSV tab-separated values
{baseDir}/youtube-transcript.js VIDEO_ID --format tsv start\tduration\ttext 0.000\t5.320\tHere is the transcript text
CSV comma-separated values
{baseDir}/youtube-transcript.js VIDEO_ID --format csv start,duration,text 0.000,5.320,"Here is the transcript text"
Language Selection
Auto-select best available (default)
{baseDir}/youtube-transcript.js VIDEO_ID
Specific language by code
{baseDir}/youtube-transcript.js VIDEO_ID --language en {baseDir}/youtube-transcript.js VIDEO_ID --language es {baseDir}/youtube-transcript.js VIDEO_ID --language fr
Partial matches work too
{baseDir}/youtube-transcript.js VIDEO_ID --language zh # Matches zh-CN, zh-TW, etc.
Language with auto-generated preference
{baseDir}/youtube-transcript.js VIDEO_ID --language ja --format srt
Common Language Codes:
Code Language
en English
es Spanish
fr French
de German
ja Japanese
ko Korean
zh Chinese
pt Portuguese
ru Russian
hi Hindi
ar Arabic
it Italian
Save to File
Save transcript directly to file
{baseDir}/youtube-transcript.js VIDEO_ID --output transcript.txt {baseDir}/youtube-transcript.js VIDEO_ID --format srt --output subtitles.srt {baseDir}/youtube-transcript.js VIDEO_ID --format json --output data.json
Shell redirection (equivalent)
{baseDir}/youtube-transcript.js VIDEO_ID --format vtt > captions.vtt
Advanced Options
Skip cache (force fresh fetch)
{baseDir}/youtube-transcript.js VIDEO_ID --no-cache
Verbose debugging output
DEBUG=1 {baseDir}/youtube-transcript.js VIDEO_ID
Combine options
{baseDir}/youtube-transcript.js VIDEO_ID --language es --format srt --output spanish.srt --no-cache
Features
Format Comparison
Format Use Case Human Readable Machine Readable
text
Default viewing ✅ ⚠️
plain
Content only ✅ ⚠️
json
API integration ⚠️ ✅
srt
Subtitle files ✅ ✅
vtt
Web captions ✅ ✅
tsv
Spreadsheet import ⚠️ ✅
csv
Database import ⚠️ ✅
Supported Video URL Formats
Plain video ID (11 characters)
EBw7gsDPAYQ
Standard YouTube URL
https://www.youtube.com/watch?v=EBw7gsDPAYQ
Short youtu.be URL
Embed URL
https://www.youtube.com/embed/EBw7gsDPAYQ
YouTube Live URL
https://www.youtube.com/live/EBw7gsDPAYQ
URLs with additional parameters (automatically handled)
https://www.youtube.com/watch?v=EBw7gsDPAYQ&t=120s https://www.youtube.com/watch?v=EBw7gsDPAYQ&index=2
Playlist URLs (extracts first video)
https://www.youtube.com/watch?v=EBw7gsDPAYQ&list=...
Intelligent Caching
The skill implements intelligent caching to improve performance:
-
Cache Location: /tmp/youtube-transcript-cache/
-
TTL: 24 hours per entry
-
Max Entries: 100 videos
-
Benefits:
-
Instant retrieval of previously fetched transcripts
-
Reduced load on YouTube servers
-
Better performance for repeated operations
Cache Bypass:
Force fresh fetch
{baseDir}/youtube-transcript.js VIDEO_ID --no-cache
Rate Limiting
To avoid being blocked by YouTube:
-
Max 60 requests per minute
-
Minimum 1 second delay between requests
-
Exponential backoff on retries
Retry Logic
When requests fail:
-
First attempt
-
Wait 2 seconds, retry
-
Wait 4 seconds, retry
-
Wait 6 seconds, retry
-
Final error reported
Error Handling
Error Codes
Code Name Description Resolution
0 SUCCESS Transcript fetched None needed
1 INVALID_VIDEO_ID Bad URL/ID format double-check the video ID
2 VIDEO_NOT_FOUND Video doesn't exist Verify video exists
3 TRANSCRIPT_DISABLED Creator disabled captions Contact creator
4 NO_TRANSCRIPT No captions available Wait for transcript
5 VIDEO_UNAVAILABLE Can't access Check restrictions
6 PRIVATE_VIDEO Video is private Get access/permission
7 RATE_LIMITED Too many requests Wait before retry
8 NETWORK_ERROR Connection issue Check internet
9 PARSE_ERROR Data extraction failed Try again
99 UNKNOWN Unexpected error Report issue
Common Errors and Solutions
"Could not extract player data"
-
YouTube may have changed their page structure
-
The video may be age-restricted
-
The video may require login
-
Solution: Try again later or check if video is publicly accessible
"No captions available for this video"
-
Creator hasn't added captions
-
Auto-generated captions aren't ready (may take a few hours after upload)
-
Video is too new
-
Solution: Wait for YouTube to generate captions, or check if manual captions exist
"Rate limited by YouTube"
-
Too many requests in short period
-
Solution: Wait 1-2 minutes before retrying
"Transcript too long"
-
Video exceeds 500K characters
-
Solution: Use --format json which handles large transcripts better
"Video unavailable or not found"
-
Video removed or never existed
-
Region-restricted
-
Solution: Verify video ID/URL is correct
Technical Architecture
Data Flow
Video ID/URL ↓ Extract Video ID ← URL parser (7+ formats) ↓ Check Cache ← 24hr TTL store ↓[cache miss] Fetch YouTube Page ← HTTP with retry logic ↓ Extract Player Data ← ytInitialPlayerResponse ↓ Parse Caption Tracks ← Language selection ↓ Fetch Transcript ← Select appropriate URL ↓ Parse Entries ← XML/JSON parsing ↓ Format Output ← 7 output formats ↓ Cache & Return ← Store for 24hr
Player Data Extraction
Extracts multiple potential sources:
-
ytInitialPlayerResponse JavaScript variable
-
playerResponse JSON in script tags
-
Caption tracks from various locations
Transcript Parsing
Supports multiple formats:
-
JSON API Response: Modern format
-
Timed Text XML: Legacy format
-
Alternative XML: Older structure
-
Special handling for: Auto-generated vs manual captions
Data Unescaping
Properly handles:
-
& → &
-
< → <
-
> → >
-
" → "
-
' / ' / ' → '
-
Whitespace normalization
Sample Output
JSON Format (Full)
{ "title": "How Artificial Intelligence Works", "author": "Example Channel", "duration": "PT10M32S", "language": "en", "isAutoGenerated": true, "transcript": [ { "start": 0.000, "duration": 5.320, "text": "In this video, we'll explore how AI systems learn and adapt" }, { "start": 5.320, "duration": 4.180, "text": "to perform tasks that traditionally required human intelligence" } ], "word_count": 2847, "total_entries": 156 }
SRT Format (SubRip)
1 00:00:00,000 --> 00:00:05,320 In this video, we'll explore how AI systems learn and adapt
2 00:00:05,320 --> 00:00:09,500 to perform tasks that traditionally required human intelligence
3 00:00:09,500 --> 00:00:13,240 This process is called machine learning
...
VTT Format (WebVTT)
WEBVTT
1 00:00.000 --> 00:05.320 In this video, we'll explore how AI systems learn and adapt
2 00:05.320 --> 00:09.500 to perform tasks that traditionally required human intelligence
...
Examples
Download Transcripts for Playlist
#!/bin/bash
Process multiple videos from IDs file
for video_id in $(cat video_ids.txt); do echo "Processing: $video_id"
{baseDir}/youtube-transcript.js "$video_id" --format srt --output "transcripts/${video_id}.srt" 2>/dev/null
if [ $? -eq 0 ]; then echo " ✓ Success" else echo " ✗ Failed" fi
Sleep to respect rate limits
sleep 2 done
Convert to PDF for Reading
#!/bin/bash VIDEO_ID="EBw7gsDPAYQ"
Get transcript
{baseDir}/youtube-transcript.js "$VIDEO_ID" --format plain > transcript.txt
Convert to PDF (requires pandoc)
pandoc transcript.txt -o transcript.pdf echo "PDF created: transcript.pdf"
Analyze Word Counts
#!/bin/bash VIDEO_ID="EBw7gsDPAYQ"
Get JSON format
{baseDir}/youtube-transcript.js "$VIDEO_ID" --format json | jq -r ' "Title: (.title)", "Author: (.author)", "Words: (.word_count)", "Entries: (.total_entries)", "Language: (.language)(.isAutoGenerated ? " (auto)" : "")" '
Batch Download with Progress
#!/bin/bash VIDEOS=("VIDEO1" "VIDEO2" "VIDEO3") TOTAL=${#VIDEOS[@]}
for i in "${!VIDEOS[@]}"; do id="${VIDEOS[$i]}" echo "[$((i+1))/$TOTAL] Processing $id..."
{baseDir}/youtube-transcript.js "$id" --format json --output "data/${id}.json" 2>/dev/null
sleep 1 # Rate limit protection done
Create Bilingual Subtitles
#!/bin/bash VIDEO_ID="your-video-id"
Get English and Spanish
{baseDir}/youtube-transcript.js "$VIDEO_ID" --language en --format srt > english.srt echo "English ✓"
{baseDir}/youtube-transcript.js "$VIDEO_ID" --language es --format srt > spanish.srt echo "Spanish ✓"
Combine (requires ffmpeg)
ffmpeg -i video.mp4 -i english.srt -i spanish.srt
-map 0:v -map 0:a -map 1:s:0 -map 2:s:0
-c:v copy -c:a copy -c:s mov_text
"${VIDEO_ID}_bilingual.mp4"
echo "Bilingual video created ✓"
Performance Tips
- Use Caching
First fetch: ~2-5 seconds
Cached fetch: ~100ms
First time (slow)
{baseDir}/youtube-transcript.js VIDEO_ID
Second time (fast - from cache)
{baseDir}/youtube-transcript.js VIDEO_ID
Force refresh (slow)
{baseDir}/youtube-transcript.js VIDEO_ID --no-cache
- Batch Processing with Delays
Bad - might hit rate limits
for id in $IDS; do {baseDir}/youtube-transcript.js "$id" done
Good - respects rate limits
for id in $IDS; do {baseDir}/youtube-transcript.js "$id" sleep 2 done
- Parallel Processing (Limited)
Process 2-3 at a time (don't exceed rate limit)
{baseDir}/youtube-transcript.js VIDEO1 & {baseDir}/youtube-transcript.js VIDEO2 & {baseDir}/youtube-transcript.js VIDEO3 & wait
- Output Format Selection
-
Fastest: plain (smallest output, fastest write)
-
Recommended: text or json (balanced)
-
For subtitles: srt or vtt (industry standard)
Limitations
-
No Private Videos: Requires public access
-
No Age-Restricted: Some videos unavailable
-
No Members-Only: Requires YouTube membership
-
Livestream Lag: Captions may be delayed
-
New Videos: Auto-generated captions take time
-
Rate Limits: Max 60 requests/minute
-
Large Transcripts: Limited to 500K characters
Notes
-
Cached transcripts expire after 24 hours
-
Auto-generated captions may have errors
-
Manual captions are preferred when available
-
Language codes follow YouTube's internal format
-
SRT format uses comma for milliseconds (WebVTT uses period)
-
TSV and CSV formats are UTF-8 encoded
-
JSON output includes metadata for programmatic use
-
Script is network-resilient with automatic retries
-
Use --output to save directly to file (handles special characters)
-
STDERR contains progress messages and metadata
-
STDOUT contains the actual transcript data