whisper-transcribe

Transcribe audio files to text using OpenAI Whisper. Supports speech-to-text with auto language detection, multiple output formats (txt, srt, vtt, json), batch processing, and model selection (tiny to large). Use when transcribing audio recordings, podcasts, voice messages, lectures, meetings, or any audio/video file to text. Handles mp3, wav, m4a, ogg, flac, webm, opus, aac formats.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "whisper-transcribe" with this command: npx skills add josunlp/whisper-transcribe

Whisper Transcribe

Transcribe audio with scripts/transcribe.sh:

# Basic (auto-detect language, base model)
scripts/transcribe.sh recording.mp3

# German, small model, SRT subtitles
scripts/transcribe.sh --model small --language de --format srt lecture.wav

# Batch process, all formats
scripts/transcribe.sh --format all --output-dir ./transcripts/ *.mp3

# Word-level timestamps
scripts/transcribe.sh --timestamps interview.m4a

Models

ModelRAMSpeedAccuracyBest for
tiny~1GB⚡⚡⚡★★Quick drafts, known language
base~1GB⚡⚡★★★General use (default)
small~2GB★★★★Good accuracy
medium~5GB🐢★★★★★High accuracy
large~10GB🐌★★★★★Best accuracy (slow on Pi)

Output Formats

  • txt — Plain text transcript
  • srt — SubRip subtitles (for video)
  • vtt — WebVTT subtitles
  • json — Detailed JSON with timestamps and confidence
  • all — Generate all formats at once

Requirements

  • whisper CLI (pip install openai-whisper)
  • ffmpeg (for audio decoding)
  • First run downloads the model (~150MB for base)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ReelTalk

Helper for processing shared video links. Takes a URL, downloads the audio track, creates a text transcript, and produces a summary. Supports all major platf...

Registry SourceRecently Updated
2611Profile unavailable
General

Speech to Text

Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, me...

Registry SourceRecently Updated
3070Profile unavailable
General

Groq Whisper

Transcribe audio files using Groq's Whisper API (whisper-large-v3). Fast cloud-based speech-to-text with no local model required. Use when receiving voice me...

Registry SourceRecently Updated
2080Profile unavailable
General

Deapi Audio

Text-to-speech, voice cloning, voice design, and transcribe audio files via deAPI GPU network. Trigger on 'text to speech', 'TTS', 'generate voice', 'read al...

Registry SourceRecently Updated
1801Profile unavailable