Voice Email Skill

Send emails using natural voice commands. Perfect for accessibility use cases.

What It Does

When you receive a voice message, parse and send an email:

Input format:

new email to [recipient], subject [subject], body [body], send

Examples:

"new email to john@example.com, subject Hello, body How are you doing, send"
"send email to mom@gmail.com, subject Dinner, body See you at 7pm, send"

What This Skill CANNOT Do

❌ Execute arbitrary code
❌ Access files outside of logging/debugging
❌ Modify system files
❌ Access other accounts without explicit OAuth
❌ Send emails to unknown recipients without user confirmation

Prerequisites

This skill requires:

gogcli - Google CLI for Gmail (must be installed separately)
Deepgram - For voice transcription (API key required)
Telegram bot - For receiving voice messages (already configured in OpenClaw)
ElevenLabs - Optional, for voice responses (not required)

Install gogcli (once, manually)

Option A - via npm (recommended):

npm install -g gogcli

Option B - via binary (verify source): Download from https://gogcli.ai and verify the binary

Then authenticate:

gog auth add your-email@gmail.com

Configure Deepgram (REQUIRED)

Add to openclaw.json:

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [{"provider": "deepgram", "model": "nova-3"}]
      }
    }
  },
  "env": {
    "DEEPGRAM_API_KEY": "your-deepgram-key"
  }
}

Configure ElevenLabs (OPTIONAL)

For voice responses, add to openclaw.json:

{
  "messages": {
    "tts": {
      "auto": "always",
      "provider": "elevenlabs",
      "elevenlabs": {
        "apiKey": "YOUR_ELEVENLABS_KEY",
        "voiceId": "YOUR_VOICE_ID"
      }
    }
  }
}

Without ElevenLabs, text responses still work.

Usage

Simply send a voice message with the command. The agent will:

Transcribe it (via Deepgram)
Parse the fields
Send the email (via gogcli)
Confirm via text (or voice if ElevenLabs configured)

Command Parser

The agent extracts:

to: Email address (after "to", "email to", "send to")
subject: Text after "subject"
body: Text after "body" (before "send")

Environment Variables

Variable	Required	Description
DEEPGRAM_API_KEY	Yes	For voice transcription
ELEVENLABS_API_KEY	No	For voice responses
ELEVENLABS_VOICE_ID	No	Voice to use

Security Notes

Network: Requires access to Telegram API, Deepgram API, Gmail API
Credentials:
- gogcli stores OAuth tokens in system keyring
- Deepgram key in openclaw.json (or environment)
- ElevenLabs key in openclaw.json (optional)
Data: Voice recordings processed by Deepgram, emails sent via user's Gmail
Privilege: Modifies openclaw.json to enable media/audio
Does NOT: Execute arbitrary code, access unrelated files, or modify system

Best Practices for Production

Use test accounts: Create dedicated Gmail account for testing
Limit Gmail OAuth: Use app-specific passwords if needed
Scope Deepgram: Use minimal quota for testing
Review logs: Check /tmp/openclaw-*.log for unexpected activity
Backup config: cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.bak

Uninstall

clawhub uninstall voice-email

Then remove API keys from openclaw.json if desired.

Validation / Testing

To verify the skill is working:

Test Deepgram directly:

curl -X POST "https://api.deepgram.com/v1/listen" \
  -H "Authorization: Token $DEEPGRAM_API_KEY" \
  -H "Content-Type: audio/ogg" \
  --data-binary @sample.ogg

Test gogcli:

gog auth status
gog gmail send --to "your-email@gmail.com" --subject "Test" --body "Working!"

Send a voice message on Telegram: "new email to your-email@gmail.com, subject test, body hello, send"

voice-email

Safety Notice

Copy this and send it to your AI assistant to learn