qwen-asr

Speech-to-text using Qwen3-ASR-0.6B-4bit MLX model via a local FastAPI service. Transcribes audio files and URLs. Optimized for Apple Silicon. Use when user sends voice messages or audio that needs transcription.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "qwen-asr" with this command: npx skills add stvlynn/skills/stvlynn-skills-qwen-asr

Qwen ASR Skill

Speech-to-text using Qwen3-ASR model, running locally on Apple Silicon via a FastAPI service.

Overview

  • Model: mlx-community/Qwen3-ASR-0.6B-4bit (4-bit quantized, ~400MB)
  • Runtime: MLX (Apple Silicon GPU acceleration via MPS)
  • Service: FastAPI on http://localhost:8100
  • Languages: Chinese, English, Japanese, Korean, and more

First-time Deployment

Prerequisites

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • Python 3.10+
  • Docker is not required (runs natively)

1. Create virtual environment

cd /path/to/skills/skills/qwen-asr
python3 -m venv venv
source venv/bin/activate

2. Install dependencies

模型从 ModelScope/HuggingFace 镜像下载(国内更快):

pip install -r service/requirements.txt

3. Start the service

bash service/start.sh

首次启动时会自动从 hf-mirror.com 下载模型(约 400MB),后续启动使用本地缓存。

4. Verify

# 检查服务健康状态
curl http://localhost:8100/health

# 查看模型信息
curl http://localhost:8100/info

# 测试转录(使用在线音频)
curl -X POST "http://localhost:8100/transcribe_url?audio_url=https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"

5. (Optional) 设为系统服务

macOS 上可以使用 launchd 设置开机自启:

# 创建 plist(自行修改路径)
cat > ~/Library/LaunchAgents/com.qwen.asr.plist << 'PLIST'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.qwen.asr</string>
    <key>ProgramArguments</key>
    <array>
        <string>/bin/bash</string>
        <string>-c</string>
        <string>cd /path/to/skills/skills/qwen-asr && bash service/start.sh</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>StandardOutPath</key>
    <string>/tmp/qwen-asr.log</string>
    <key>StandardErrorPath</key>
    <string>/tmp/qwen-asr.err</string>
</dict>
</plist>
PLIST

launchctl load ~/Library/LaunchAgents/com.qwen.asr.plist

Troubleshooting

问题解决方法
端口 8100 被占用lsof -i :8100 找到占用进程,或修改 start.sh 中端口号
模型下载缓慢脚本已自动设置 HF_ENDPOINT=https://hf-mirror.com
503 Model not loaded模型仍在加载中,首次约需 10-30 秒
ModuleNotFoundError: mlx确认使用 Apple Silicon Mac
转录结果为空检查音频格式(支持 wav/mp3/ogg/flac),或音频可能太短

Service Endpoints

EndpointMethodDescription
/healthGETHealth check
/infoGETModel information
/transcribePOSTTranscribe uploaded audio file
/transcribe_urlPOSTTranscribe audio from URL

Usage

CLI Client

# Transcribe local file
python3 scripts/asr.py audio.wav

# Transcribe from URL
python3 scripts/asr.py --url "https://example.com/audio.wav"

# Specify language
python3 scripts/asr.py audio.wav --lang zh

# Check service status
python3 scripts/asr.py --check

curl

# Upload file
curl -X POST "http://localhost:8100/transcribe" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@audio.wav"

# From URL with language
curl -X POST "http://localhost:8100/transcribe_url?audio_url=<URL>&language=zh"

Python

import requests

# From URL
response = requests.post(
    "http://localhost:8100/transcribe_url",
    params={"audio_url": "https://example.com/audio.wav", "language": "zh"}
)
text = response.json()["text"]

# From file
with open("audio.wav", "rb") as f:
    response = requests.post("http://localhost:8100/transcribe", files={"file": f})
text = response.json()["text"]

Output Format

{
  "text": "transcribed text here",
  "chunks": [],
  "processing_time": 0.123
}

Service Management

# Start
bash service/start.sh

# Check status
python3 scripts/asr.py --check

# Stop (find and kill process)
lsof -ti :8100 | xargs kill

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

qwen-asr

No summary provided by upstream source.

Repository SourceNeeds Review
118-aahl
General

create-sticker

No summary provided by upstream source.

Repository SourceNeeds Review
General

searxng

No summary provided by upstream source.

Repository SourceNeeds Review
General

tsticker

No summary provided by upstream source.

Repository SourceNeeds Review