speech-to-text

Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, meeting clips, podcasts, interviews, or any local audio file (.ogg, .mp3, .wav, .m4a, etc.) and wants a transcript, rough captions, or an English translation without relying on paid APIs first.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "speech-to-text" with this command: npx skills add shu-hari/hf-whisper-speech-to-text

Speech to Text

Use this skill to turn local audio files into text with a public Whisper-based endpoint.

Quick start

Run:

python3 scripts/transcribe.py /path/to/file.ogg

Return the transcript as plain text. By default, the script also applies lightweight Chinese punctuation and sentence-breaking cleanup.

For machine-readable output:

python3 scripts/transcribe.py /path/to/file.ogg --json

To disable cleanup and keep the raw model text:

python3 scripts/transcribe.py /path/to/file.ogg --format raw

To force Chinese punctuation cleanup:

python3 scripts/transcribe.py /path/to/file.ogg --format zh

For English translation instead of same-language transcription:

python3 scripts/transcribe.py /path/to/file.ogg --task translate

Workflow

  1. Confirm the input is a local audio file.
  2. Run scripts/transcribe.py on it.
  3. If the transcript looks imperfect, tell the user it came from a public Whisper endpoint and may need cleanup.
  4. If helpful, post-process into:
    • cleaned transcript
    • summary
    • action items
    • bilingual output

What the script does

The script:

  • uploads the local file to a public Gradio-backed Hugging Face Space
  • submits a Whisper transcription job
  • waits for completion via the Gradio event stream
  • prints the resulting text

Default endpoint:

  • https://hf-audio-whisper-large-v3-turbo.hf.space

Override it with:

python3 scripts/transcribe.py input.ogg --space https://your-space.hf.space

or set:

export HF_WHISPER_SPACE=https://your-space.hf.space

Guardrails

  • Treat this as a best-effort public/free path, not a privacy-grade path.
  • Do not use for highly sensitive audio unless the user explicitly accepts public third-party processing.
  • Expect rate limits, queueing, and occasional outages.
  • If the public endpoint fails, explain that the free backend is unavailable and offer alternatives.

Output handling

Prefer to return:

  • the raw transcript when the user asked to "转文字/听写"
  • a cleaned version when punctuation is poor
  • a short note about uncertainty if names, numbers, or jargon may be wrong

Script

  • scripts/transcribe.py — public Whisper transcription helper

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Groq Whisper

Transcribe audio files using Groq's Whisper API (whisper-large-v3). Fast cloud-based speech-to-text with no local model required. Use when receiving voice me...

Registry SourceRecently Updated
1910Profile unavailable
General

Telegram Voice Transcribe

Transcribe Telegram voice messages and audio notes into text using the OpenAI Whisper API. Use when (1) a user sends a voice message or audio note via Telegr...

Registry Source
5390Profile unavailable
Automation

Whisper Voice Transcription (whisper.cpp)

Build and use whisper.cpp for local speech-to-text workflows, with optional cloud fallback when local transcription is not practical.

Registry SourceRecently Updated
920Profile unavailable
General

Deapi Audio

Text-to-speech, voice cloning, voice design, and transcribe audio files via deAPI GPU network. Trigger on 'text to speech', 'TTS', 'generate voice', 'read al...

Registry Source
1681Profile unavailable
speech-to-text | V50.AI