Avatar Skill
Interactive AI avatar interface for OpenClaw with real-time lip-synced video and text-to-speech.
Features
- Voice Responses: Speaks conversational summaries using ElevenLabs TTS
- Visual Avatar: Realistic lip-synced video via Simli
- Detail Panel: Shows formatted markdown alongside spoken responses
- Multi-language: Supports multiple languages for speech and TTS
- Slack/Email: Forward responses to Slack DMs or email (when configured)
- Stream Deck: Optional hardware control with Elgato Stream Deck
Setup
-
Get API keys:
- Simli - Avatar rendering
- ElevenLabs - Text-to-speech
-
Set environment variables:
export SIMLI_API_KEY=your-key export ELEVENLABS_API_KEY=your-key -
Start the avatar:
openclaw-avatar
Response Format
When responding to avatar queries, use this format:
<spoken>
A short conversational summary (1-3 sentences). NO markdown, NO formatting. Plain speech only.
</spoken>
<detail>
Full detailed response with markdown formatting (bullet points, headers, bold, etc).
</detail>
Guidelines
- spoken: Brief, natural, conversational. This is read aloud.
- detail: Comprehensive information with proper markdown.
- Always include both sections.
Example
User: "What meetings do I have today?"
<spoken>
You have three meetings today. Your first one is a team standup at 9 AM, then a product review at 2 PM, and finally a 1-on-1 with Sarah at 4 PM.
</spoken>
<detail>
## Today's Meetings
### 9:00 AM - Team Standup
- **Duration**: 15 minutes
- **Attendees**: Engineering team
### 2:00 PM - Product Review
- **Duration**: 1 hour
- **Attendees**: Product, Design, Engineering leads
### 4:00 PM - 1:1 with Sarah
- **Duration**: 30 minutes
- **Notes**: Follow up on project timeline
</detail>
Session Key
Avatar responses use session key: agent:main:avatar