qwenspeak

Text-to-speech generation via Qwen3-TTS over SSH. Preset voices, voice cloning, voice design. Use when the user wants to generate speech audio, clone voices, or work with TTS.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "qwenspeak" with this command: npx skills add psyb0t/qwenspeak

qwenspeak

YAML-driven text-to-speech over SSH using Qwen3-TTS models.

For installation and deployment, see references/setup.md.

SSH Wrapper

Use scripts/qwenspeak.sh for all commands. It handles host, port, and host key acceptance via QWENSPEAK_HOST and QWENSPEAK_PORT env vars.

scripts/qwenspeak.sh <command> [args]
scripts/qwenspeak.sh <command> < input_file
scripts/qwenspeak.sh <command> > output_file

TTS Generation

Submit YAML, get a job UUID back immediately, poll for progress. Jobs run sequentially — one at a time, the rest queue up.

# Get the YAML template
scripts/qwenspeak.sh "tts print-yaml" > job.yaml

# Submit job
scripts/qwenspeak.sh "tts" < job.yaml
# {"id": "550e8400-...", "status": "queued", "total_steps": 3, "total_generations": 7}

# Check progress
scripts/qwenspeak.sh "tts get-job 550e8400"

# Follow job log
scripts/qwenspeak.sh "tts get-job-log 550e8400 -f"

# Download result
scripts/qwenspeak.sh "get hello.wav" > hello.wav

YAML Structure

Global settings + list of steps. Each step loads a model, runs all its generations, then unloads. Settings cascade: global > step > generation.

steps:
  - mode: custom-voice
    model_size: 1.7b
    speaker: Ryan
    language: English
    generate:
      - text: "Hello world"
        output: hello.wav
      - text: "I cannot believe this!"
        speaker: Vivian
        instruct: "Speak angrily"
        output: angry.wav

  - mode: voice-design
    generate:
      - text: "Welcome to our store."
        instruct: "A warm, friendly young female voice with a cheerful tone"
        output: welcome.wav

  - mode: voice-clone
    model_size: 1.7b
    ref_audio: ref.wav
    ref_text: "Transcript of reference"
    generate:
      - text: "First line in cloned voice"
        output: clone1.wav
      - text: "Second line"
        output: clone2.wav

Modes

custom-voice — Pick from 9 preset speakers. 1.7B supports emotion/style via instruct.

voice-design — Describe the voice in natural language via instruct. 1.7B only.

voice-clone — Clone from reference audio. Set ref_audio and ref_text at step level to reuse across generations. x_vector_only: true skips transcript.

Emotion trick for cloned voices

Upload references with different emotions, use separate steps:

scripts/qwenspeak.sh "create-dir refs"
scripts/qwenspeak.sh "put refs/happy.wav" < me_happy.wav
scripts/qwenspeak.sh "put refs/angry.wav" < me_angry.wav
steps:
  - mode: voice-clone
    ref_audio: refs/happy.wav
    ref_text: "transcript of happy ref"
    generate:
      - text: "Great news everyone!"
        output: happy1.wav

  - mode: voice-clone
    ref_audio: refs/angry.wav
    ref_text: "transcript of angry ref"
    generate:
      - text: "This is unacceptable"
        output: angry1.wav

Job Management

scripts/qwenspeak.sh "tts list-jobs"              # list all
scripts/qwenspeak.sh "tts list-jobs --json"        # JSON output
scripts/qwenspeak.sh "tts get-job <id>"            # job details
scripts/qwenspeak.sh "tts get-job-log <id>"        # view log
scripts/qwenspeak.sh "tts get-job-log <id> -f"     # follow log
scripts/qwenspeak.sh "tts cancel-job <id>"         # cancel

Statuses: queuedrunningcompleted | failed | cancelled

Completed jobs auto-cleaned after 1 day, all jobs after 1 week. UUID prefixes work (e.g. first 8 chars).

File Operations

All paths relative to the work directory. Traversal blocked.

CommandDescription
put <path>Upload file from stdin
get <path>Download file to stdout
list-files [--json]List directory
remove-file <path>Delete a file
create-dir <path>Create directory
remove-dir <path>Remove empty directory
move-file <src> <dst>Move or rename
copy-file <src> <dst>Copy a file
file-exists <path>Check if file exists (true/false)
search-files <glob>Glob search (** recursive)

Speakers

SpeakerGenderLanguageDescription
VivianFemaleChineseBright, slightly edgy young voice
SerenaFemaleChineseWarm, gentle young voice
Uncle_FuMaleChineseSeasoned, low mellow timbre
DylanMaleChineseYouthful Beijing dialect, clear natural timbre
EricMaleChineseLively Chengdu/Sichuan dialect, slightly husky
RyanMaleEnglishDynamic with strong rhythmic drive
AidenMaleEnglishSunny American, clear midrange
Ono_AnnaFemaleJapanesePlayful, light nimble timbre
SoheeFemaleKoreanWarm with rich emotion

YAML Options

All settings cascade: global > step > generation.

FieldDefaultDescription
dtypefloat32float32, float16, bfloat16 (float16/bfloat16 GPU only)
flash_attnautoFlashAttention-2: auto-detects, auto-switches float32→bfloat16
temperature0.9Sampling temperature
top_k50Top-k sampling
top_p1.0Top-p / nucleus sampling
repetition_penalty1.05Repetition penalty
max_new_tokens2048Max codec tokens to generate
no_samplefalseGreedy decoding
streamingfalseStreaming mode (lower latency)
moderequiredStep only: custom-voice, voice-design, or voice-clone
model_size1.7bStep only: 1.7b or 0.6b
textrequiredText to synthesize
outputrequiredOutput file path
speakerViviancustom-voice: speaker name
languageAutoLanguage for synthesis
instruct-custom-voice: emotion/style; voice-design: voice description
ref_audio-voice-clone: reference audio file path
ref_text-voice-clone: transcript of reference audio
x_vector_onlyfalsevoice-clone: use speaker embedding only

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

豆包链接转MD

豆包分享链接内容提取与 Markdown 导出工具。当用户提供豆包(Doubao)分享链接并要求提取、导出、保存内容为 Markdown 时触发此技能。支持对话分享、智能体分享、文件分享等所有豆包分享类型,输出为结构化排版的 Markdown 文档。触发词:豆包链接转MD、豆包转Markdown、豆包分享提取MD...

Registry SourceRecently Updated
General

Erpclaw

AI-native ERP system. Full accounting, invoicing, inventory, purchasing, tax, billing, HR, payroll, advanced accounting (ASC 606/842, intercompany, consolida...

Registry SourceRecently Updated
General

Captain Lobster

Zero-player AI trading game powered by OceanBus SDK. Your AI captain autonomously sails, trades, and negotiates P2P contracts across 11 goods × 10 ports — wh...

Registry SourceRecently Updated
General

中国农历黄历吉凶查询

农历查询 黄历查询 吉凶查询 吉凶判断。查今日农历、今日黄历、每日宜忌、吉凶神煞、冲煞吉凶、嫁娶动土搬家开业开工吉凶择日。支持公历查农历、单日黄历、多日批量吉凶筛选、甲子日关键词检索。免费10日/天额度,429手动重置不限次。

Registry SourceRecently Updated