whisper-transcribe

Transcribe audio files to text using OpenAI Whisper. Supports speech-to-text with auto language detection, multiple output formats (txt, srt, vtt, json), batch processing, and model selection (tiny to large). Use when transcribing audio recordings, podcasts, voice messages, lectures, meetings, or any audio/video file to text. Handles mp3, wav, m4a, ogg, flac, webm, opus, aac formats.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "whisper-transcribe" with this command: npx skills add josunlp/whisper-transcribe

Whisper Transcribe

Transcribe audio with scripts/transcribe.sh:

# Basic (auto-detect language, base model)
scripts/transcribe.sh recording.mp3

# German, small model, SRT subtitles
scripts/transcribe.sh --model small --language de --format srt lecture.wav

# Batch process, all formats
scripts/transcribe.sh --format all --output-dir ./transcripts/ *.mp3

# Word-level timestamps
scripts/transcribe.sh --timestamps interview.m4a

Models

Model	RAM	Speed	Accuracy	Best for
tiny	~1GB	⚡⚡⚡	★★	Quick drafts, known language
base	~1GB	⚡⚡	★★★	General use (default)
small	~2GB	⚡	★★★★	Good accuracy
medium	~5GB	🐢	★★★★★	High accuracy
large	~10GB	🐌	★★★★★	Best accuracy (slow on Pi)

Output Formats

txt — Plain text transcript
srt — SubRip subtitles (for video)
vtt — WebVTT subtitles
json — Detailed JSON with timestamps and confidence
all — Generate all formats at once

Requirements

whisper CLI (pip install openai-whisper)
ffmpeg (for audio decoding)
First run downloads the model (~150MB for base)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Open Registry Record Open in ClawHub

Related Skills

Related by shared tags or category signals.

General

ReelTalk

Helper for processing shared video links. Takes a URL, downloads the audio track, creates a text transcript, and produces a summary. Supports all major platf...

Registry SourceRecently Updated

2611Profile unavailable

General

Speech to Text

Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, me...

Registry SourceRecently Updated

3070Profile unavailable

General

Groq Whisper

Transcribe audio files using Groq's Whisper API (whisper-large-v3). Fast cloud-based speech-to-text with no local model required. Use when receiving voice me...

Registry SourceRecently Updated

2080Profile unavailable

General

Deapi Audio

Text-to-speech, voice cloning, voice design, and transcribe audio files via deAPI GPU network. Trigger on 'text to speech', 'TTS', 'generate voice', 'read al...

Registry SourceRecently Updated

1801Profile unavailable