whisper-voice-transcription

Build and use whisper.cpp for local speech-to-text workflows, with optional cloud fallback when local transcription is not practical.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "whisper-voice-transcription" with this command: npx skills add xuxuclassmate/whisper-voice-transcription

Whisper Voice Transcription with whisper.cpp

When to use

  • You want local speech-to-text without sending audio to a third party.
  • You need a fallback workflow when a built-in transcription tool fails.
  • You want an operator guide for compiling and running whisper.cpp.

Prerequisites

  • git
  • cmake
  • a C or C++ compiler
  • ffmpeg

Build steps

git clone --depth 1 https://github.com/ggerganov/whisper.cpp.git ~/whisper.cpp
cd ~/whisper.cpp
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j4

Download a model from the official ggerganov/whisper.cpp releases or Hugging Face repository and place it under ~/whisper.cpp/models/.

Standard transcription flow

ffmpeg -y -i input_audio.ogg -ar 16000 -ac 1 -f wav /tmp/voice.wav
~/whisper.cpp/build/bin/whisper-cli \
  -m ~/whisper.cpp/models/ggml-large-v3.bin \
  -f /tmp/voice.wav \
  -l auto \
  --no-timestamps

Fallback workflow

If a higher-level tool fails, first locate the exact cache or upload path used by that tool. Search only within the expected application cache directory instead of scanning the entire home directory.

Cloud fallback

If local transcription is too slow or unavailable, use an approved speech API and tell the user that audio will leave the machine.

Guardrails

  • Download binaries and models only from official sources.
  • Verify hashes when possible.
  • Do not search unrelated directories for audio files.
  • Be explicit when using a cloud provider because that changes the privacy model.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Speech to Text

Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, me...

Registry SourceRecently Updated
2870Profile unavailable
General

Groq Whisper

Transcribe audio files using Groq's Whisper API (whisper-large-v3). Fast cloud-based speech-to-text with no local model required. Use when receiving voice me...

Registry SourceRecently Updated
1910Profile unavailable
General

Telegram Voice Transcribe

Transcribe Telegram voice messages and audio notes into text using the OpenAI Whisper API. Use when (1) a user sends a voice message or audio note via Telegr...

Registry Source
5380Profile unavailable
General

Deapi Audio

Text-to-speech, voice cloning, voice design, and transcribe audio files via deAPI GPU network. Trigger on 'text to speech', 'TTS', 'generate voice', 'read al...

Registry Source
1681Profile unavailable
whisper-voice-transcription | V50.AI