jarvis-vocal
Uses the authentic J.A.R.V.I.S. voice model from HuggingFace (trained on actual movie lines) via Piper TTS. No audio effects needed — the voice is naturally cinematic and British.
Credit: Voice model by jgkawell — see the discussion for details on training and samples.
Usage
Generate a WAV file:
{baseDir}/bin/jarvis-tts "Text to speak" ./output.wav
Stream directly to an Android device (if ADB connected):
{baseDir}/bin/jarvis-tts "Text to speak" - | adb push - /sdcard/Download/temp.wav
Installation
Prerequisites
pipx install piper-tts
sudo apt install ffmpeg # or equivalent
Install Voice Model
# Create voice directory
mkdir -p ~/.local/share/piper/voices/en_GB
# Download models via HuggingFace CLI
cd ~/.local/share/piper/voices/en_GB
hf download jgkawell/jarvis en/en_GB/jarvis/high/jarvis-high.onnx --local-dir .
hf download jgkawell/jarvis en/en_GB/jarvis/high/jarvis-high.onnx.json --local-dir .
# Optional: medium quality model
hf download jgkawell/jarvis en/en_GB/jarvis/medium/jarvis-medium.onnx --local-dir .
hf download jgkawell/jarvis en/en_GB/jarvis/medium/jarvis-medium.onnx.json --local-dir .
Integration
Works with OpenClaw Android nodes via ADB over Tailscale. Use jarvis-speak wrapper for one-command push+play:
jarvis-speak "Systems at your service, Sir."
Or use streaming mode (faster, ephemeral):
jarvis-speak "Message" --stream
Configuration
| Setting | Default | Description |
|---|---|---|
| Model | jarvis-high | Voice quality: high (114MB) or medium (63MB) |
| Speed | 1.0 (native) | Piper length-scale — adjust for faster/slower speech |
| Volume | 1.0 | Post-processing volume boost |
Edit jarvis-speak script to change defaults.
Troubleshooting
"Model not found" → Download models to ~/.local/share/piper/voices/en_GB/jarvis-*
ADB connection refused → Ensure phone's ADB over WiFi is enabled and paired with laptop (port 5555)
Audio doesn't play → Check Android receives the file at /sdcard/Download/jarvis-current.wav and has a WAV-capable media player
License
MIT — The voice model is MIT licensed by jgkawell.
Credits
- Voice model: jgkawell/jarvis on HuggingFace — trained on Marvel movie lines
- TTS engine: Piper by Rhasspy
- Integration: OpenClaw by Aidan Park