text-to-voice

Convert text to speech using Kyutai's Pocket TTS. Use when the user asks to "generate speech", "text to speech", "TTS", "convert text to audio", "voice synthesis", "generate voice", "read aloud", or "create audio from text". Supports voice cloning from audio samples and multiple pre-made voices (alba, marius, javert, jean, fantine, cosette, eponine, azelma).

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "text-to-voice" with this command: npx skills add kenneropia/text-to-voice/kenneropia-text-to-voice-text-to-voice

Text-to-Voice with Kyutai Pocket TTS

Convert text to natural speech using Kyutai's Pocket TTS - a lightweight 100M parameter model that runs efficiently on CPU.

Installation

pip install pocket-tts
# or use uvx to run without installing:
uvx pocket-tts generate

Requires Python 3.10+ and PyTorch 2.5+. GPU not required.

CLI Usage

Basic Generation

# Generate with defaults (saves to ./tts_output.wav)
uvx pocket-tts generate

# Specify text
pocket-tts generate --text "Hello, this is my message."

# Specify output file location
pocket-tts generate --text "Hello" --output-path ./audio/greeting.wav

# Full example with all common options
pocket-tts generate \
  --text "Welcome to the demo." \
  --voice alba \
  --output-path ./output/welcome.wav

CLI Options

OptionDefaultDescription
--text"Hello world..."Text to convert to speech
--voicealbaVoice name, local file path, or HuggingFace URL
--output-path./tts_output.wavWhere to save the generated audio file
--temperature0.7Generation temperature (higher = more expressive)
--lsd-decode-steps1Quality steps (higher = better quality, slower)
--eos-threshold-4.0End detection threshold (lower = finish earlier)
--frames-after-eosautoExtra frames after end (each frame = 80ms)
--devicecpuDevice to use (cpu/cuda)
-q, --quietfalseDisable logging output

Voice Selection (CLI)

# Use a pre-made voice by name
pocket-tts generate --voice alba --text "Hello"
pocket-tts generate --voice javert --text "Hello"

# Use a local audio file for voice cloning
pocket-tts generate --voice ./my_voice.wav --text "Hello"

# Use a voice from HuggingFace
pocket-tts generate --voice "hf://kyutai/tts-voices/alba-mackenna/merchant.wav" --text "Hello"

Quality Tuning (CLI)

# Higher quality (more generation steps)
pocket-tts generate --lsd-decode-steps 5 --temperature 0.5 --output-path high_quality.wav

# More expressive/varied output
pocket-tts generate --temperature 1.0 --output-path expressive.wav

# Shorter output (finishes speaking earlier)
pocket-tts generate --eos-threshold -3.0 --output-path shorter.wav

Local Web Server

For quick iteration with multiple voices/texts:

uvx pocket-tts serve
# Open http://localhost:8000

Available Voices

Pre-made voices (use name directly with --voice):

VoiceGenderLicenseDescription
albaFemaleCC BY 4.0Casual voice
mariusMaleCC0Voice donation
javertMaleCC0Voice donation
jeanMaleCC-NCEARS dataset
fantineFemaleCC BY 4.0VCTK dataset
cosetteFemaleCC-NCExpresso dataset
eponineFemaleCC BY 4.0VCTK dataset
azelmaFemaleCC BY 4.0VCTK dataset

Full voice catalog: https://huggingface.co/kyutai/tts-voices

For detailed voice information, see references/voices.md.

Voice Cloning

Clone any voice from an audio sample. For best results:

  • Use clean audio (minimal background noise)
  • 10+ seconds recommended
  • Consider Adobe Podcast Enhance to clean samples
pocket-tts generate --voice ./my_recording.wav --text "Hello" --output-path cloned.wav

Output Format

  • Sample Rate: 24kHz
  • Channels: Mono
  • Format: 16-bit PCM WAV
  • Default location: ./tts_output.wav

Python API

For programmatic use:

from pocket_tts import TTSModel
import scipy.io.wavfile

tts_model = TTSModel.load_model()
voice_state = tts_model.get_state_for_audio_prompt("alba")
audio = tts_model.generate_audio(voice_state, "Hello world!")

# Save to specific location
scipy.io.wavfile.write("./audio/output.wav", tts_model.sample_rate, audio.numpy())

TTSModel.load_model()

model = TTSModel.load_model(
    variant="b6369a24",      # Model variant
    temp=0.7,                # Temperature (0.0-1.0)
    lsd_decode_steps=1,      # Generation steps
    noise_clamp=None,        # Max noise value
    eos_threshold=-4.0       # End-of-sequence threshold
)

Voice State

# Pre-made voice
voice_state = model.get_state_for_audio_prompt("alba")

# Local file
voice_state = model.get_state_for_audio_prompt("./my_voice.wav")

# HuggingFace
voice_state = model.get_state_for_audio_prompt("hf://kyutai/tts-voices/alba-mackenna/casual.wav")

Generate Audio

audio = model.generate_audio(voice_state, "Text to speak")
# Returns: torch.Tensor (1D)

Streaming

for chunk in model.generate_audio_stream(voice_state, "Long text..."):
    # Process each chunk as it's generated
    pass

Properties

  • model.sample_rate - 24000 Hz
  • model.device - "cpu" or "cuda"

Performance

  • ~200ms latency to first audio chunk
  • ~6x real-time on MacBook Air M4 CPU
  • Uses only 2 CPU cores

Limitations

  • English only
  • No built-in pause/silence control

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ht-skills

管理灏天文库文集和文档,支持新建文集、新建文档、查询文集/文档、更新文档、修改文档归属、管理文档层级。适用于 OpenClaw 自主写文章并上传、文集创建、文档入库、文档移动等场景。

Archived SourceRecently Updated
General

问专家 - Playwriter模式

# 问专家技能 - 使用 Playwriter 控制已登录的浏览器

Archived SourceRecently Updated
General

ai-image-generator

AI 图片与视频异步生成技能,调用 AI Artist API 根据文本提示词生成图片或视频,自动轮询直到任务完成。 ⚠️ 使用前必须设置环境变量 AI_ARTIST_TOKEN 为你自己的 API Key! 获取 API Key:访问 https://staging.kocgo.vip/index 注册登录后创建。 支持图片模型:SEEDREAM5_0(默认高质量图片)、NANO_BANANA_2(轻量快速)。 支持视频模型:SEEDANCE_1_5_PRO(文生视频,支持音频)、SORA2(文生视频或首尾帧图生视频,支持 firstImageUrl/lastImageUrl)。 触发场景: - 用户要求生成图片,如"生成一匹狼"、"画一只猫"、"风景画"、"帮我画"等。 - 用户要求生成视频,如"生成视频"、"用 SORA2 生成"、"文生视频"、"图生视频"、"生成一段...的视频"等。 - 用户指定模型:SEEDREAM5_0、NANO_BANANA_2、SEEDANCE_1_5_PRO、SORA2。

Archived SourceRecently Updated
General

淘宝投放数据分析

# 投放数据分析技能

Archived SourceRecently Updated