qwen-tts

Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "qwen-tts" with this command: npx skills add paki81/qwen-tts/paki81-qwen-tts-qwen-tts

Qwen TTS

Local text-to-speech using Hugging Face's Qwen3-TTS-12Hz-1.7B-CustomVoice model.

Quick Start

Generate speech from text:

scripts/tts.py "Ciao, come va?" -l Italian -o output.wav

With voice instruction (emotion/style):

scripts/tts.py "Sono felice!" -i "Parla con entusiasmo" -l Italian -o happy.wav

Different speaker:

scripts/tts.py "Hello world" -s Ryan -l English -o hello.wav

Installation

First-time setup (one-time):

cd skills/public/qwen-tts
bash scripts/setup.sh

This creates a local virtual environment and installs qwen-tts package (~500MB).

Note: First synthesis downloads ~1.7GB model from Hugging Face automatically.

Usage

scripts/tts.py [options] "Text to speak"

Options

  • -o, --output PATH - Output file path (default: qwen_output.wav)
  • -s, --speaker NAME - Speaker voice (default: Vivian)
  • -l, --language LANG - Language (default: Auto)
  • -i, --instruct TEXT - Voice instruction (emotion, style, tone)
  • --list-speakers - Show available speakers
  • --model NAME - Model name (default: CustomVoice 1.7B)

Examples

Basic Italian speech:

scripts/tts.py "Benvenuto nel futuro del text-to-speech" -l Italian -o welcome.wav

With emotion/instruction:

scripts/tts.py "Sono molto felice di vederti!" -i "Parla con entusiasmo e gioia" -l Italian -o happy.wav

Different speaker:

scripts/tts.py "Hello, nice to meet you" -s Ryan -l English -o ryan.wav

List available speakers:

scripts/tts.py --list-speakers

Available Speakers

The CustomVoice model includes 9 premium voices:

SpeakerLanguageDescription
VivianChineseBright, slightly edgy young female
SerenaChineseWarm, gentle young female
Uncle_FuChineseSeasoned male, low mellow timbre
DylanChinese (Beijing)Youthful Beijing male, clear
EricChinese (Sichuan)Lively Chengdu male, husky
RyanEnglishDynamic male, rhythmic
AidenEnglishSunny American male
Ono_AnnaJapanesePlayful female, light nimble
SoheeKoreanWarm female, rich emotion

Recommendation: Use each speaker's native language for best quality, though all speakers support all 10 languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian).

Voice Instructions

Use -i, --instruct to control emotion, tone, and style:

Italian examples:

  • "Parla con entusiasmo"
  • "Tono serio e professionale"
  • "Voce calma e rilassante"
  • "Leggi come un narratore"

English examples:

  • "Speak with excitement"
  • "Very happy and energetic"
  • "Calm and soothing voice"
  • "Read like a narrator"

Integration with OpenClaw

The script outputs the audio file path to stdout (last line), making it compatible with OpenClaw's TTS workflow:

# OpenClaw captures the output path
cd skills/public/qwen-tts
OUTPUT=$(scripts/tts.py "Ciao" -s Vivian -l Italian -o /tmp/audio.wav 2>/dev/null)
# OUTPUT = /tmp/audio.wav

Performance

  • GPU (CUDA): ~1-3 seconds for short phrases
  • CPU: ~10-30 seconds for short phrases
  • Model size: ~1.7GB (auto-downloads on first run)
  • Venv size: ~500MB (installed dependencies)

Troubleshooting

Setup fails:

# Ensure Python 3.10-3.12 is available
python3.12 --version

# Re-run setup
cd skills/public/qwen-tts
rm -rf venv
bash scripts/setup.sh

Model download slow/fails:

# Use mirror (China mainland)
export HF_ENDPOINT=https://hf-mirror.com
scripts/tts.py "Test" -o test.wav

Out of memory (GPU): The model automatically falls back to CPU if GPU memory insufficient.

Audio quality issues:

  • Try different speaker: --list-speakers
  • Add instruction: -i "Speak clearly and slowly"
  • Check language matches text: -l Italian for Italian text

Model Details

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

qwen-tts

No summary provided by upstream source.

Repository SourceNeeds Review
General

qwen-tts-voice-cloning

No summary provided by upstream source.

Repository SourceNeeds Review
General

Speech Writer

演讲稿(TED风格)、婚礼致辞、商务演讲、励志演讲、祝酒词、演讲大纲。Speech writing for TED-style talks, wedding speeches, business presentations, motivational speeches, toasts, and outlines....

Registry SourceRecently Updated
General

Study Plan

学习计划生成器。考研计划、考证规划、每日学习、番茄钟。Study plan generator for exams, certifications, daily schedules. 学习计划、考研计划、备考规划。Use when creating study schedules.

Registry SourceRecently Updated