ReelTalk
Process a video link: download audio, transcribe, and summarize. Falls back to image text extraction when no speech is present.
⚠️ Videos longer than 5 minutes will take roughly 1 min of processing per minute of video on CPU. Warn the user before starting.
Steps
0. Fresh start
rm -rf /tmp/reeltalk_*
mkdir -p /tmp/reeltalk_work
1. Get metadata
yt-dlp --print title --print description --print uploader "<url>" \
2>/dev/null > /tmp/reeltalk_work/metadata.txt
2. Audio
yt-dlp -f "bestaudio" -o "/tmp/reeltalk_work/audio.%(ext)s" "<url>"
3. Check length & split
ffprobe -v error -show_entries format=duration -of \
default=noprint_wrappers=1:nokey=1 /tmp/reeltalk_work/audio.m4a
If duration > 300, warn user, then split:
ffmpeg -i /tmp/reeltalk_work/audio.m4a -f segment -segment_time 300 \
-acodec pcm_s16le -ac 1 -ar 16000 /tmp/reeltalk_work/chunk_%03d.wav
4. Transcribe
Use the base Whisper model (memory-efficient on 8GB machines).
for chunk in /tmp/reeltalk_work/chunk_*.wav; do
base=$(basename "$chunk" .wav)
whisper "$chunk" --model base --language en --task transcribe \
2>/dev/null > "/tmp/reeltalk_work/transcript_${base}.txt"
done
For short videos, transcribe directly:
whisper /tmp/reeltalk_work/audio.m4a --model base --language en --task transcribe \
2>/dev/null > /tmp/reeltalk_work/full_transcript.txt
5. Assemble
> /tmp/reeltalk_work/full_transcript.txt
for f in /tmp/reeltalk_work/transcript_chunk_*.txt; do
echo "=== $(basename "$f" .txt) ===" >> /tmp/reeltalk_work/full_transcript.txt
cat "$f" >> /tmp/reeltalk_work/full_transcript.txt
echo "" >> /tmp/reeltalk_work/full_transcript.txt
done
6. OCR fallback
If transcript is empty or under 20 words:
yt-dlp -f "bv*+ba/b" -o "/tmp/reeltalk_work/video.mp4" "<url>"
mkdir -p /tmp/reeltalk_work/frames
ffmpeg -i /tmp/reeltalk_work/video.mp4 -vf "fps=1" -vsync vfr \
-q:v 2 /tmp/reeltalk_work/frames/frame_%04d.jpg
for f in /tmp/reeltalk_work/frames/frame_*.jpg; do
tesseract "$f" stdout --psm 6 2>/dev/null
done > /tmp/reeltalk_work/ocr_output.txt
7. Summarize
Combine metadata + transcript (or OCR) into a plain English summary.
Platform notes
- X/Twitter: use fxtwitter API (yt-dlp fails). Get video URL from
api.fxtwitter.com/<user>/status/<id>, download with curl. - TikTok: yt-dlp handles fine.
- Instagram: yt-dlp handles fine.
Requirements
yt-dlp(brew),whisper(brew),tesseract(brew),ffmpeg(brew)- Whisper cache:
~/.cache/whisper/(base model ~142MB)