whisper-gpu-transcribe

Convert audio to SRT subtitles using OpenAI Whisper with automatic GPU acceleration for Intel XPU / NVIDIA CUDA / AMD ROCm / Apple Metal. Ideal for content creators as a free alternative to paid subtitle generation.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "whisper-gpu-transcribe" with this command: npx skills add allanmeng/whisper-gpu-transcriber-skill

🎙️ Whisper GPU Audio Transcriber

Convert audio files to SRT subtitles using local Whisper models — completely free, offline, and GPU accelerated.


Use Cases

  • Content creation, free alternative to paid subtitle features (e.g., CapCut/剪映)
  • Meeting recording to text
  • Podcast/course subtitles

Supported GPU Acceleration

DeviceAccelerationFP16
Intel Arc SeriesXPU❌ Auto disabled
NVIDIA GPUsCUDA✅ Auto enabled
AMD GPUsROCm✅ Auto enabled
Apple M SeriesMetal✅ Auto enabled
No GPUCPU❌ Auto disabled

Usage

Basic Usage

Place the audio file in your current working directory and tell the AI:

Convert xxx.mp3 to SRT subtitles

Or specify the full path directly:

Convert /path/to/audio.mp3 to SRT subtitles

Advanced Usage

Convert xxx.mp3 to English subtitles using large-v3-turbo model

Convert xxx.mp3 to subtitles, language is Japanese

Execution

AI will execute the scripts/transcribe.py script, which will:

  1. Automatically detect available GPU and select optimal acceleration
  2. Load Whisper model (default: turbo)
  3. Transcribe audio to SRT format
  4. Save output in the same directory as the audio

Requirements

  • Python 3.8+
  • PyTorch (version matching your hardware)
    • Intel GPU: pip install torch==2.10.0+xpu
    • NVIDIA GPU: pip install torch --index-url https://download.pytorch.org/whl/cu121
    • CPU: pip install torch
  • openai-whisper: Automatically installed via pip install openai-whisper

Notes

  • First run will auto-download the model file (turbo ~1.5GB)
  • Models cache in ~/.cache/whisper by default, use symlink/Junction to redirect to another disk
  • Intel XPU requires Intel Arc GPU + matching PyTorch version

Tip for China users: If model download fails, manually download from mirror sites and place in ~/.cache/whisper/


Supported Models

ModelSizeSpeedAccuracy
tiny39MFastestLow
base74MFastMedium
small244MMediumMedium
medium769MSlowHigh
turbo809MMediumHigh ✅ Recommended
large-v31550MSlowestHighest
large-v3-turbo1550MSlowHighest


🎙️ Whisper GPU 音频转字幕

使用本地 Whisper 模型将音频文件转录为 SRT 字幕,完全免费,无需联网,支持 GPU 加速。


适用场景

  • 自媒体视频制作,替代剪映付费字幕功能
  • 会议录音转文字
  • 播客/课程内容转字幕

支持的 GPU 加速

设备加速方式FP16
Intel Arc 系列XPU❌ 自动禁用
NVIDIA 显卡CUDA✅ 自动启用
AMD 显卡ROCm✅ 自动启用
Apple M 系列Metal✅ 自动启用
无独显CPU❌ 自动禁用

使用方法

基础用法

将音频文件放入当前工作目录,然后告诉 AI:

把 xxx.mp3 转成 SRT 字幕文件

或者直接指定路径:

把 /path/to/audio.mp3 转成 SRT 字幕

高级用法

把 xxx.mp3 用 large-v3-turbo 模型转成英文字幕

把 xxx.mp3 转成字幕,语言是日语

执行方式

AI 会调用 scripts/transcribe.py 脚本执行转录,脚本会:

  1. 自动检测可用 GPU 设备并选择最优加速方式
  2. 加载 Whisper 模型(默认 turbo
  3. 将音频转录为 SRT 格式字幕
  4. 输出文件保存在与音频同目录

环境要求

  • Python 3.8+
  • PyTorch(版本需匹配硬件)
    • Intel GPU:pip install torch==2.10.0+xpu
    • NVIDIA GPU:pip install torch --index-url https://download.pytorch.org/whl/cu121
    • CPU:pip install torch
  • openai-whisper:由 ClawHub 通过 pip install openai-whisper 自动安装

注意事项

  • 首次运行会自动下载模型文件(turbo 约 1.5GB)
  • 模型默认缓存在 ~/.cache/whisper,可用软链接/Junction 指向其他磁盘
  • Intel XPU 需要 Intel Arc 独显 + 对应版本 PyTorch

国内用户提示:首次运行会自动下载模型,如下载失败可手动从镜像站下载后放入 ~/.cache/whisper/


支持的模型

模型大小速度准确度
tiny39M最快
base74M
small244M
medium769M
turbo809M高 ✅ 推荐
large-v31550M最慢最高
large-v3-turbo1550M最高

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ReelTalk

Helper for processing shared video links. Takes a URL, downloads the audio track, creates a text transcript, and produces a summary. Supports all major platf...

Registry SourceRecently Updated
2611Profile unavailable
General

Speech to Text

Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, me...

Registry SourceRecently Updated
3070Profile unavailable
General

Groq Whisper

Transcribe audio files using Groq's Whisper API (whisper-large-v3). Fast cloud-based speech-to-text with no local model required. Use when receiving voice me...

Registry SourceRecently Updated
2080Profile unavailable
General

Deapi Audio

Text-to-speech, voice cloning, voice design, and transcribe audio files via deAPI GPU network. Trigger on 'text to speech', 'TTS', 'generate voice', 'read al...

Registry SourceRecently Updated
1801Profile unavailable