Local OpenAI-Compatible TTS for OpenClaw
Configure OpenClaw to send TTS requests to a local OpenAI-compatible endpoint, then verify end-to-end delivery.
Quick workflow
- Set OpenAI base URL to local endpoint.
- Configure OpenClaw messages TTS provider/model/voice.
- Verify TTS config with
openclaw config get. - Generate a direct API sample clip to confirm voice mapping.
- Send sample via channel plugin (Telegram/WhatsApp/etc.) if requested.
- If remote access is requested, expose the TTS service port (not necessarily the OpenClaw gateway).
1) Configure OpenClaw to use local TTS backend
Use CLI config commands only.
openclaw config set env.vars.OPENAI_BASE_URL http://127.0.0.1:19000/v1
openclaw config set messages.tts.provider openai
openclaw config set messages.tts.openai.model tts-1-hd
openclaw config set messages.tts.openai.voice me
Verify:
openclaw config get env.vars.OPENAI_BASE_URL
openclaw config get messages.tts
2) Verify cloned voice exists on backend
If using openedai-speech + XTTS voice mapping, cloned voices are commonly available only on tts-1-hd.
Check voice map inside container:
sudo docker exec openedai-speech sh -lc 'sed -n "1,220p" /app/config/voice_to_speaker.yaml'
If voice: me fails with KeyError, check whether:
- wrong model is used (
tts-1instead oftts-1-hd), or - voice key missing from
voice_to_speaker.yaml.
3) Generate a deterministic test clip (direct API)
Use direct POST to validate backend behavior independent of chat surface rendering.
curl -sS -X POST http://127.0.0.1:8880/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{
"model":"tts-1-hd",
"voice":"me",
"input":"Quick cloned voice check.",
"speed":1.25,
"response_format":"mp3"
}' \
--output /tmp/clone-test.mp3
file /tmp/clone-test.mp3
Expected: MP3 audio file (not JSON error text).
4) Important limitation: speed pinning in OpenClaw config
messages.tts.openai.speed may be rejected by current OpenClaw schema. If so:
- keep model/voice in OpenClaw config,
- set speed per request when generating clips directly,
- or enforce speed with a local proxy layer in front of backend.
Do not claim speed is globally pinned unless schema accepts it.
5) Expose service correctly (LAN/Tailscale)
Distinguish between:
- OpenClaw gateway exposure (
gateway.bind,gateway.tailscale.*), and - TTS backend exposure (container/service port such as
19000or8880).
If user asks to expose local TTS only, do not change gateway bind/mode unless explicitly requested.
For TTS backend reachability:
- Confirm listener/bind:
ss -ltnp | grep ':19000\|:8880' - If bound to
127.0.0.1, rebind service/container to0.0.0.0or tailnet interface. - Restrict access via firewall/Tailscale ACLs.
6) Channel delivery troubleshooting
If webchat does not play voice attachments:
- send as regular file attachment to supported channel (e.g., Telegram),
- verify target id explicitly,
- confirm local file still exists before sending.
If file missing, regenerate clip and resend.
Command safety
- Use
openclaw config set/get(never editopenclaw.jsondirectly). - Avoid unrelated gateway changes when task is strictly TTS service exposure.
- For external sends, use channel tools and explicit target ids.