Local OpenAI-Compatible TTS for OpenClaw

Configure OpenClaw to send TTS requests to a local OpenAI-compatible endpoint, then verify end-to-end delivery.

Quick workflow

Set OpenAI base URL to local endpoint.
Configure OpenClaw messages TTS provider/model/voice.
Verify TTS config with openclaw config get.
Generate a direct API sample clip to confirm voice mapping.
Send sample via channel plugin (Telegram/WhatsApp/etc.) if requested.
If remote access is requested, expose the TTS service port (not necessarily the OpenClaw gateway).

1) Configure OpenClaw to use local TTS backend

Use CLI config commands only.

openclaw config set env.vars.OPENAI_BASE_URL http://127.0.0.1:19000/v1
openclaw config set messages.tts.provider openai
openclaw config set messages.tts.openai.model tts-1-hd
openclaw config set messages.tts.openai.voice me

Verify:

openclaw config get env.vars.OPENAI_BASE_URL
openclaw config get messages.tts

2) Verify cloned voice exists on backend

If using openedai-speech + XTTS voice mapping, cloned voices are commonly available only on tts-1-hd.

Check voice map inside container:

sudo docker exec openedai-speech sh -lc 'sed -n "1,220p" /app/config/voice_to_speaker.yaml'

If voice: me fails with KeyError, check whether:

wrong model is used (tts-1 instead of tts-1-hd), or
voice key missing from voice_to_speaker.yaml.

3) Generate a deterministic test clip (direct API)

Use direct POST to validate backend behavior independent of chat surface rendering.

curl -sS -X POST http://127.0.0.1:8880/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{
    "model":"tts-1-hd",
    "voice":"me",
    "input":"Quick cloned voice check.",
    "speed":1.25,
    "response_format":"mp3"
  }' \
  --output /tmp/clone-test.mp3

file /tmp/clone-test.mp3

Expected: MP3 audio file (not JSON error text).

4) Important limitation: speed pinning in OpenClaw config

messages.tts.openai.speed may be rejected by current OpenClaw schema. If so:

keep model/voice in OpenClaw config,
set speed per request when generating clips directly,
or enforce speed with a local proxy layer in front of backend.

Do not claim speed is globally pinned unless schema accepts it.

5) Expose service correctly (LAN/Tailscale)

Distinguish between:

OpenClaw gateway exposure (gateway.bind, gateway.tailscale.*), and
TTS backend exposure (container/service port such as 19000 or 8880).

If user asks to expose local TTS only, do not change gateway bind/mode unless explicitly requested.

For TTS backend reachability:

Confirm listener/bind:
```
ss -ltnp | grep ':19000\|:8880'
```
If bound to 127.0.0.1, rebind service/container to 0.0.0.0 or tailnet interface.
Restrict access via firewall/Tailscale ACLs.

6) Channel delivery troubleshooting

If webchat does not play voice attachments:

send as regular file attachment to supported channel (e.g., Telegram),
verify target id explicitly,
confirm local file still exists before sending.

If file missing, regenerate clip and resend.

Command safety

Use openclaw config set/get (never edit openclaw.json directly).
Avoid unrelated gateway changes when task is strictly TTS service exposure.
For external sends, use channel tools and explicit target ids.

local-openedai-tts

Safety Notice

Copy this and send it to your AI assistant to learn

Local OpenAI-Compatible TTS for OpenClaw

Quick workflow

1) Configure OpenClaw to use local TTS backend

2) Verify cloned voice exists on backend

3) Generate a deterministic test clip (direct API)

4) Important limitation: speed pinning in OpenClaw config

5) Expose service correctly (LAN/Tailscale)

6) Channel delivery troubleshooting

Command safety

Source Transparency

Related Skills

Sherpa ONNX TTS

TTS Media Route Fix

Autonoannounce

kkclaw