ReelTalk

Process a video link: download audio, transcribe, and summarize. Falls back to image text extraction when no speech is present.

⚠️ Videos longer than 5 minutes will take roughly 1 min of processing per minute of video on CPU. Warn the user before starting.

Steps

0. Fresh start

rm -rf /tmp/reeltalk_*
mkdir -p /tmp/reeltalk_work

1. Get metadata

yt-dlp --print title --print description --print uploader "<url>" \
  2>/dev/null > /tmp/reeltalk_work/metadata.txt

2. Audio

yt-dlp -f "bestaudio" -o "/tmp/reeltalk_work/audio.%(ext)s" "<url>"

3. Check length & split

ffprobe -v error -show_entries format=duration -of \
  default=noprint_wrappers=1:nokey=1 /tmp/reeltalk_work/audio.m4a

If duration > 300, warn user, then split:

ffmpeg -i /tmp/reeltalk_work/audio.m4a -f segment -segment_time 300 \
  -acodec pcm_s16le -ac 1 -ar 16000 /tmp/reeltalk_work/chunk_%03d.wav

4. Transcribe

Use the base Whisper model (memory-efficient on 8GB machines).

for chunk in /tmp/reeltalk_work/chunk_*.wav; do
  base=$(basename "$chunk" .wav)
  whisper "$chunk" --model base --language en --task transcribe \
    2>/dev/null > "/tmp/reeltalk_work/transcript_${base}.txt"
done

For short videos, transcribe directly:

whisper /tmp/reeltalk_work/audio.m4a --model base --language en --task transcribe \
  2>/dev/null > /tmp/reeltalk_work/full_transcript.txt

5. Assemble

> /tmp/reeltalk_work/full_transcript.txt
for f in /tmp/reeltalk_work/transcript_chunk_*.txt; do
  echo "=== $(basename "$f" .txt) ===" >> /tmp/reeltalk_work/full_transcript.txt
  cat "$f" >> /tmp/reeltalk_work/full_transcript.txt
  echo "" >> /tmp/reeltalk_work/full_transcript.txt
done

6. OCR fallback

If transcript is empty or under 20 words:

yt-dlp -f "bv*+ba/b" -o "/tmp/reeltalk_work/video.mp4" "<url>"
mkdir -p /tmp/reeltalk_work/frames
ffmpeg -i /tmp/reeltalk_work/video.mp4 -vf "fps=1" -vsync vfr \
  -q:v 2 /tmp/reeltalk_work/frames/frame_%04d.jpg
for f in /tmp/reeltalk_work/frames/frame_*.jpg; do
  tesseract "$f" stdout --psm 6 2>/dev/null
done > /tmp/reeltalk_work/ocr_output.txt

7. Summarize

Combine metadata + transcript (or OCR) into a plain English summary.

Platform notes

X/Twitter: use fxtwitter API (yt-dlp fails). Get video URL from api.fxtwitter.com/<user>/status/<id>, download with curl.
TikTok: yt-dlp handles fine.
Instagram: yt-dlp handles fine.

Requirements

yt-dlp (brew), whisper (brew), tesseract (brew), ffmpeg (brew)
Whisper cache: ~/.cache/whisper/ (base model ~142MB)

reeltalk

Safety Notice

Copy this and send it to your AI assistant to learn

ReelTalk

Steps

0. Fresh start

1. Get metadata

2. Audio

3. Check length & split

4. Transcribe

5. Assemble

6. OCR fallback

7. Summarize

Platform notes

Requirements

Source Transparency

Related Skills

Video Intelligence

Video Summary

Postnify

Whisper Voice Transcription (whisper.cpp)