WebChat Voice Proxy
Set up a reboot-safe voice stack for OpenClaw WebChat (including the current polished mic/stop/hourglass UI states):
- HTTPS Control UI on port 8443
/transcribeproxy to local faster-whisper service- WebSocket passthrough to gateway (
ws://127.0.0.1:18789) - Voice button script injection into Control UI
- Real-time VU meter: button shadow/scale reacts to voice level
- Push-to-Talk: hold mic button to record, release to send (default mode)
- Toggle mode: click to start, click to stop (switch via double-click on mic button)
- Keyboard shortcuts:
Ctrl+SpacePush-to-Talk,Ctrl+Shift+Mstart/stop continuous recording - Localized UI: auto-detects browser language (English, German, Chinese built-in), customizable
Prerequisites (required)
This skill requires a local faster-whisper HTTP service. Expected default:
- URL:
http://127.0.0.1:18790/transcribe - systemd user service:
openclaw-transcribe.service
Verify before deployment:
systemctl --user is-active openclaw-transcribe.service
curl -s -o /dev/null -w '%{http_code}\n' http://127.0.0.1:18790/transcribe -X POST -H 'Content-Type: application/octet-stream' --data-binary 'x'
If this dependency is missing, set up faster-whisper first (model load + HTTP endpoint), then run this skill.
Related skills:
faster-whisper-local-service(backend prerequisite)webchat-voice-full-stack(meta-installer that deploys both backend + proxy)
Workflow
- Ensure transcription service exists and is running (
openclaw-transcribe.service). - Deploy
voice-input.jsto Control UI assets and inject script tag intoindex.html. - Configure gateway allowed origin for external HTTPS UI.
- Run HTTPS+WSS proxy as persistent user systemd service (
openclaw-voice-https.service). - Verify pairing/token/origin errors and resolve in order.
Security Notes
- Localhost by default: The HTTPS proxy binds to
127.0.0.1only. It is not reachable from other devices on your network unless you explicitly setVOICE_HOSTto a LAN IP. - LAN exposure: Setting
VOICE_HOST=<LAN-IP>exposes the proxy (and by extension the gateway WebSocket and transcription endpoint) to all devices on that network. Only do this on trusted networks. - Persistence: This skill installs a user systemd service (
openclaw-voice-https.service) that starts automatically on boot, and a gateway hook that re-injects the UI script after updates. Useuninstall.shto fully revert. - Self-signed TLS: The auto-generated certificate is not trusted by browsers. You will see a certificate warning on first access.
Deploy
Run (localhost only — default, most secure):
bash scripts/deploy.sh
Or expose on LAN (required to access from other devices):
VOICE_HOST=10.0.0.42 VOICE_HTTPS_PORT=8443 VOICE_LANG=de bash scripts/deploy.sh
When run interactively without VOICE_LANG, the script will ask you to choose a UI language (auto, en, de, zh). Set VOICE_LANG=auto to skip the prompt.
This script is idempotent.
Quick verify
Run:
bash scripts/status.sh
Expected:
- both services active
- injection present
https:200
Common fixes
404 /chat?...→ SPA fallback missing in HTTPS proxy.origin not allowed→ ensure deploy used correctVOICE_HOSTand added matching HTTPS origin togateway.controlUi.allowedOrigins.token missing→ open URL with?token=...once.pairing required→ approve pending device viaopenclaw devices approve <requestId> --token <gateway-token>.- Mic breaks after reboot → cert paths must be persistent (not
/tmp). - No transcription result → check local faster-whisper endpoint first.
See references/troubleshooting.md for exact commands.
What this skill modifies
Before installing, be aware of all system changes deploy.sh makes:
| What | Path | Action |
|---|---|---|
| Control UI HTML | <npm-global>/openclaw/dist/control-ui/index.html | Adds <script> tag for voice-input.js |
| Control UI asset | <npm-global>/openclaw/dist/control-ui/assets/voice-input.js | Copies mic button JS |
| Gateway config | ~/.openclaw/openclaw.json | Adds HTTPS origin to gateway.controlUi.allowedOrigins |
| Systemd service | ~/.config/systemd/user/openclaw-voice-https.service | Creates + enables persistent HTTPS proxy |
| Gateway hook | ~/.openclaw/hooks/voice-input-inject/ | Installs startup hook that re-injects JS after updates |
| Workspace files | ~/.openclaw/workspace/voice-input/ | Copies voice-input.js, https-server.py |
| TLS certs | ~/.openclaw/workspace/voice-input/certs/ | Auto-generated self-signed cert on first run |
The injected JS (voice-input.js) runs inside the Control UI and interacts with the chat input. Review the source before deploying.
Mic Button Controls
| Action | Effect |
|---|---|
| Hold (PTT mode) | Record while held, transcribe on release |
| Click (Toggle mode) | Start recording / stop and transcribe |
| Double-click | Switch between PTT and Toggle mode |
| Right-click | Toggle beep sound on/off |
| Ctrl+Space (hold) | Push-to-Talk via keyboard (works even with text field focused) |
| Ctrl+Shift+M | Start/stop recording (transcribes on stop) |
| Ctrl+Shift+B | Start/stop live transcription [beta] — text appears in real-time, auto-sends after 2s review, stops on 5s silence or "Stop Hugo" keyword |
The current mode and available actions are shown in the button tooltip on hover.
Language / i18n
The UI automatically detects the browser language and shows tooltips, toasts, and placeholder text in the matching language.
Built-in languages: English (en), German (de), Chinese (zh)
Override language
Set a language override in the browser console:
localStorage.setItem('oc-voice-lang', 'de'); // force German
localStorage.setItem('oc-voice-lang', 'zh'); // force Chinese
localStorage.removeItem('oc-voice-lang'); // back to auto-detect
Then reload the page.
Add a custom language
Edit voice-input.js and add a new entry to the I18N object. Use assets/i18n.json as a template — it contains all translation keys with the built-in translations.
Example for adding French:
const I18N = {
// ... existing entries ...
fr: {
tooltip_ptt: "Maintenir pour parler",
tooltip_toggle: "Cliquer pour démarrer/arrêter",
tooltip_next_toggle: "Mode clic",
tooltip_next_ptt: "Push-to-Talk",
tooltip_beep_off: "Désactiver le bip",
tooltip_beep_on: "Activer le bip",
tooltip_dblclick: "Double-clic",
tooltip_rightclick: "Clic droit",
toast_ptt: "Push-to-Talk",
toast_toggle: "Mode clic",
toast_beep_on: "Bip activé",
toast_beep_off: "Bip désactivé",
placeholder_suffix: " — Voix : (Ctrl+Espace Push-To-Talk, Ctrl+Shift+M enregistrement continu)"
}
};
After editing, redeploy with bash scripts/deploy.sh to copy the updated JS to the Control UI.
CORS Policy
The /transcribe proxy endpoint uses a configurable Access-Control-Allow-Origin header.
Set VOICE_ALLOWED_ORIGIN env var to restrict. Default: https://<VOICE_HOST>:<VOICE_HTTPS_PORT>.
Uninstall
To fully revert all changes:
bash scripts/uninstall.sh
This will:
- Stop and remove
openclaw-voice-https.service - Remove the gateway startup hook
- Remove
voice-input.jsfrom Control UI and undo the index.html injection - Remove the HTTPS origin from gateway config
- Restart the gateway
- Remove TLS certificates
- Remove workspace runtime files (
voice-input.js,https-server.py,i18n.json)
The faster-whisper backend is not touched by uninstall — remove it separately via faster-whisper-local-service if needed.