Pocket Tts

# Pocket TTS Skill

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "Pocket Tts" with this command: npx skills add sherajdev/pocket-tts

Pocket TTS Skill

Fully local, offline text-to-speech using Kyutai's Pocket TTS model. Generate high-quality audio from text without any API calls or internet connection. Features 8 built-in voices, voice cloning support, and runs entirely on CPU.

Features

  • 🎯 Fully local - No API calls, runs completely offline
  • 🚀 CPU-only - No GPU required, works on any computer
  • Fast generation - ~2-6x real-time on CPU
  • 🎤 8 built-in voices - alba, marius, javert, jean, fantine, cosette, eponine, azelma
  • 🎭 Voice cloning - Clone any voice from a WAV sample
  • 🔊 Low latency - ~200ms first audio chunk
  • 📚 Simple Python API - Easy integration into any project

Installation

# 1. Accept the model license on Hugging Face
# https://huggingface.co/kyutai/pocket-tts

# 2. Install the package
pip install pocket-tts

# Or use uv for automatic dependency management
uvx pocket-tts generate "Hello world"

Usage

CLI

# Basic usage
pocket-tts "Hello, I am your AI assistant"

# With specific voice
pocket-tts "Hello" --voice alba --output hello.wav

# With custom voice file (voice cloning)
pocket-tts "Hello" --voice-file myvoice.wav --output output.wav

# Adjust speed
pocket-tts "Hello" --speed 1.2

# Start local server
pocket-tts --serve

# List available voices
pocket-tts --list-voices

Python API

from pocket_tts import TTSModel
import scipy.io.wavfile

# Load model
tts_model = TTSModel.load_model()

# Get voice state
voice_state = tts_model.get_state_for_audio_prompt(
    "hf://kyutai/tts-voices/alba-mackenna/casual.wav"
)

# Generate audio
audio = tts_model.generate_audio(voice_state, "Hello world!")

# Save to WAV
scipy.io.wavfile.write("output.wav", tts_model.sample_rate, audio.numpy())

# Check sample rate
print(f"Sample rate: {tts_model.sample_rate} Hz")

Available Voices

VoiceDescription
albaCasual female voice
mariusMale voice
javertClear male voice
jeanNatural male voice
fantineFemale voice
cosetteFemale voice
eponineFemale voice
azelmaFemale voice

Or use --voice-file /path/to/wav.wav for custom voice cloning.

Options

OptionDescriptionDefault
textText to convertRequired
-o, --outputOutput WAV fileoutput.wav
-v, --voiceVoice presetalba
-s, --speedSpeech speed (0.5-2.0)1.0
--voice-fileCustom WAV for cloningNone
--serveStart HTTP serverFalse
--list-voicesList all voicesFalse

Requirements

  • Python 3.10-3.14
  • PyTorch 2.5+ (CPU version works)
  • Works on 2 CPU cores

Notes

  • ⚠️ Model is gated - accept license on Hugging Face first
  • 🌍 English language only (v1)
  • 💾 First run downloads model (~100M parameters)
  • 🔊 Audio is returned as 1D torch tensor (PCM data)

Links

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

MLX Audio Server

Local 24x7 OpenAI-compatible API server for STT/TTS, powered by MLX on your Mac.

Registry SourceRecently Updated
2.6K0Profile unavailable
Coding

Speak Turbo - Talk to your Claude 90ms latency!

Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency....

Registry Source
6820Profile unavailable
General

Elevenlabs Tts

ElevenLabs TTS - the best ElevenLabs integration for OpenClaw. ElevenLabs Text-to-Speech with emotional audio tags, ElevenLabs voice synthesis for WhatsApp,...

Registry SourceRecently Updated
6K6Profile unavailable
General

Voice

Convert text to speech using Microsoft Edge's TTS engine with customizable voices, direct playback, and automatic temporary file cleanup.

Registry SourceRecently Updated
2.9K0Profile unavailable