Qwen ASR Skill

Speech-to-text using Qwen3-ASR model, running locally on Apple Silicon via a FastAPI service.

Overview

Model: mlx-community/Qwen3-ASR-0.6B-4bit (4-bit quantized, ~400MB)
Runtime: MLX (Apple Silicon GPU acceleration via MPS)
Service: FastAPI on http://localhost:8100
Languages: Chinese, English, Japanese, Korean, and more

First-time Deployment

Prerequisites

macOS with Apple Silicon (M1/M2/M3/M4)
Python 3.10+
Docker is not required (runs natively)

1. Create virtual environment

cd /path/to/skills/skills/qwen-asr
python3 -m venv venv
source venv/bin/activate

2. Install dependencies

模型从 ModelScope/HuggingFace 镜像下载（国内更快）：

pip install -r service/requirements.txt

3. Start the service

bash service/start.sh

首次启动时会自动从 hf-mirror.com 下载模型（约 400MB），后续启动使用本地缓存。

4. Verify

# 检查服务健康状态
curl http://localhost:8100/health

# 查看模型信息
curl http://localhost:8100/info

# 测试转录（使用在线音频）
curl -X POST "http://localhost:8100/transcribe_url?audio_url=https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"

5. (Optional) 设为系统服务

macOS 上可以使用 launchd 设置开机自启：

# 创建 plist（自行修改路径）
cat > ~/Library/LaunchAgents/com.qwen.asr.plist << 'PLIST'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.qwen.asr</string>
    <key>ProgramArguments</key>
    <array>
        <string>/bin/bash</string>
        <string>-c</string>
        <string>cd /path/to/skills/skills/qwen-asr && bash service/start.sh</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>StandardOutPath</key>
    <string>/tmp/qwen-asr.log</string>
    <key>StandardErrorPath</key>
    <string>/tmp/qwen-asr.err</string>
</dict>
</plist>
PLIST

launchctl load ~/Library/LaunchAgents/com.qwen.asr.plist

Troubleshooting

问题	解决方法
端口 8100 被占用	`lsof -i :8100` 找到占用进程，或修改 `start.sh` 中端口号
模型下载缓慢	脚本已自动设置 `HF_ENDPOINT=https://hf-mirror.com`
503 Model not loaded	模型仍在加载中，首次约需 10-30 秒
`ModuleNotFoundError: mlx`	确认使用 Apple Silicon Mac
转录结果为空	检查音频格式（支持 wav/mp3/ogg/flac），或音频可能太短

Service Endpoints

Endpoint	Method	Description
`/health`	GET	Health check
`/info`	GET	Model information
`/transcribe`	POST	Transcribe uploaded audio file
`/transcribe_url`	POST	Transcribe audio from URL

Usage

CLI Client

# Transcribe local file
python3 scripts/asr.py audio.wav

# Transcribe from URL
python3 scripts/asr.py --url "https://example.com/audio.wav"

# Specify language
python3 scripts/asr.py audio.wav --lang zh

# Check service status
python3 scripts/asr.py --check

curl

# Upload file
curl -X POST "http://localhost:8100/transcribe" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@audio.wav"

# From URL with language
curl -X POST "http://localhost:8100/transcribe_url?audio_url=<URL>&language=zh"

Python

import requests

# From URL
response = requests.post(
    "http://localhost:8100/transcribe_url",
    params={"audio_url": "https://example.com/audio.wav", "language": "zh"}
)
text = response.json()["text"]

# From file
with open("audio.wav", "rb") as f:
    response = requests.post("http://localhost:8100/transcribe", files={"file": f})
text = response.json()["text"]

Output Format

{
  "text": "transcribed text here",
  "chunks": [],
  "processing_time": 0.123
}

Service Management

# Start
bash service/start.sh

# Check status
python3 scripts/asr.py --check

# Stop (find and kill process)
lsof -ti :8100 | xargs kill

qwen-asr

Safety Notice

Copy this and send it to your AI assistant to learn