qwen-voice

Use Qwen (DashScope/百炼) for speech tasks: (1) ASR speech-to-text transcription of user audio/voice messages (Telegram .ogg opus, wav, mp3) using qwen3-asr-flash, optionally with coarse timestamps via chunking; (2) TTS text-to-speech voice reply using qwen3-tts-flash with selectable voice (default Cherry) and output as .ogg voice note for Telegram.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "qwen-voice" with this command: npx skills add ada20204/qwen-voice/ada20204-qwen-voice-qwen-voice

Qwen Voice (ASR + TTS)

Use the bundled scripts. Configure DASHSCOPE_API_KEY in one of:

  • ~/.config/qwen-voice/.env (recommended)
  • <repo>/.qwen-voice/.env (dev/testing)

ASR (speech → text)

Non-timestamp (default)

python3 skills/qwen-voice/scripts/qwen_asr.py --in /path/to/audio.ogg

With timestamps (chunk-based)

python3 skills/qwen-voice/scripts/qwen_asr.py --in /path/to/audio.ogg --timestamps --chunk-sec 3

Notes:

  • Timestamps are generated by fixed-length chunking (not word-level alignment).
  • Input audio is converted to mono 16kHz WAV before sending.

TTS (text → speech)

Preset voice (default: Cherry)

python3 skills/qwen-voice/scripts/qwen_tts.py --text '你好,我是 Pi。' --voice Cherry --out /tmp/out.ogg

Clone voice (create once, reuse)

  1. Create a voice profile from a sample audio:
python3 skills/qwen-voice/scripts/qwen_voice_clone.py --in ./voice_sample.ogg --name george --out work/qwen-voice/george.voice.json
  1. Use the cloned voice to synthesize:
python3 skills/qwen-voice/scripts/qwen_tts.py --text '你好,我是 George。' --voice-profile work/qwen-voice/george.voice.json --out /tmp/out.ogg

Notes:

  • .ogg output is Opus, suitable for Telegram voice messages.
  • Voice cloning uses DashScope customization endpoint + Qwen realtime TTS model.
  • Scripts use a local venv at work/venv-dashscope (auto-created on first run).

Typical chat workflow

  • When user sends voice message/audio: run ASR and reply with the transcribed text.
  • When user explicitly asks for voice reply: run TTS and send the generated .ogg as a voice note.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

antigravity-mcp

No summary provided by upstream source.

Repository SourceNeeds Review
General

Relevance Ai

Relevance AI integration. Manage Organizations, Users. Use when the user wants to interact with Relevance AI data.

Registry SourceRecently Updated
General

Klipfolio

Klipfolio integration. Manage Dashboards, Users, Teams. Use when the user wants to interact with Klipfolio data.

Registry SourceRecently Updated
General

CRS Tax Calculator

CRS境外补税计算工具 - 上传券商月结单PDF/Excel,AI自动解析交易数据,FIFO/ACB成本法计算资本利得,生成Excel税务审计底稿。支持多文件年度汇总。

Registry SourceRecently Updated