hn-podcast-transcribe

Download, transcribe, and archive Hacker News podcast episodes (e.g. "Hacker News Recap" by Wondercraft). Use when: (1) user wants to transcribe HN podcast episodes, (2) user asks to archive HN podcast content, (3) user wants searchable text from HN audio podcasts, (4) user mentions "HN podcast" + "transcribe" or "archive". Supports incremental processing — only new episodes are downloaded/transcribed on each run. Works with any podcast RSS feed, not just HN.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "hn-podcast-transcribe" with this command: npx skills add terrycarter1985/hn-podcast-transcribe

HN Podcast Transcribe & Archive

Automatically download, transcribe, and archive Hacker News podcast episodes into a searchable local archive.

Default Podcast

Hacker News Recap by Wondercraft.ai — daily AI-generated recap of top HN posts.

  • RSS: https://rss.buzzsprout.com/2170103.rss

Override with HN_PODCAST_RSS env var for any podcast RSS feed.

Workflow

1. Fetch new episodes

python3 scripts/fetch_episodes.py [--rss URL] [--archive DIR] [--limit N] [--no-download]
  • Parses podcast RSS feed
  • Compares against existing archive to skip already-processed episodes
  • Downloads audio (mp3/m4a/wav) for each new episode
  • Saves metadata as JSON alongside audio
  • Default archive: ./hn-podcast-archive/
  • --no-download: save metadata only, skip audio download

Download strategies (tried in order):

  1. Direct HTTP download — works for most podcast CDNs
  2. yt-dlp fallback — handles some Cloudflare-protected hosts
  3. If both fail, the episode directory is created with metadata; place audio manually

Cloudflare note: Some hosts (e.g. Buzzsprout) block automated downloads. If direct download fails:

  • Use --no-download to create the directory structure
  • Download audio manually via browser or podcast app
  • Place the file as audio.mp3 in the episode directory
  • Re-run the transcribe step

2. Transcribe audio

python3 scripts/transcribe_episodes.py [--archive DIR] [--model MODEL] [--format FORMAT]
  • Finds episodes with audio but no transcript
  • Runs Whisper locally (no API key needed)
  • Outputs: txt, srt, vtt, or json (default: txt)
  • Default model: turbo (fast, good accuracy)
  • Supports audio formats: mp3, m4a, wav, ogg, flac

3. Generate archive index

python3 scripts/build_index.py [--archive DIR]
  • Creates archive_index.json with all episodes, dates, titles, and transcript paths
  • Enables fast search across the archive

4. Search archive

python3 scripts/search_archive.py [--archive DIR] "search query"
  • Full-text search across all transcribed episodes
  • Returns matching episodes with context snippets

One-shot: Full Pipeline

python3 scripts/pipeline.py [--rss URL] [--archive DIR] [--model MODEL] [--limit N]

Runs fetch → transcribe → index in sequence.

Cron Integration

Set up periodic processing with OpenClaw cron:

# Daily at 6am — process new HN Recap episodes
cron add --name "hn-podcast-digest" --schedule "0 6 * * *" --payload '{"kind":"agentTurn","message":"Run the HN podcast transcription pipeline: python3 scripts/pipeline.py --limit 3"}'

Archive Structure

hn-podcast-archive/
├── archive_index.json
├── 2026-05-10_hardware-attestation-as-monopoly-enabler/
│   ├── episode.json
│   ├── audio.mp3
│   └── transcript.txt
├── 2026-05-09_a-recent-experience-with-chatgpt-5-5-pro/
│   ├── episode.json
│   ├── audio.mp3
│   └── transcript.txt
└── ...

Configuration

Env VarDefaultDescription
HN_PODCAST_RSSBuzzsprout HN Recap feedPodcast RSS feed URL
HN_ARCHIVE_DIR./hn-podcast-archiveArchive directory
WHISPER_MODELturboWhisper model name
WHISPER_FORMATtxtTranscript output format

Requirements

  • Python 3.10+
  • openai-whisper (pip install openai-whisper)
  • requests (pip install requests)
  • static-ffmpeg (pip install static-ffmpeg) — auto-provides ffmpeg
  • yt-dlp (optional, for fallback downloads)
  • Whisper models auto-download to ~/.cache/whisper on first use

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

VectorClaw

Provides a secure, least-privilege interface for managing user data, personas, and config snapshots in MySQL with input validation and secret redaction.

Registry SourceRecently Updated
General

Baoyu Post To Wechat

Posts content to WeChat Official Account (微信公众号) via API or Chrome CDP. Supports article posting (文章) with HTML, markdown, or plain text input, and image-tex...

Registry SourceRecently Updated
2.5K0jimliu
General

Subscription Cancel Call Script

Builds a firm, honest cancellation call script with refusal phrases, evidence checklist, and confirmation log for recurring services.

Registry SourceRecently Updated
General

gifgrep

Search, preview, download, and process GIFs from Tenor and Giphy

Registry SourceRecently Updated