add-voice-transcription

Add Voice Transcription

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "add-voice-transcription" with this command: npx skills add qwibitai/nanoclaw/qwibitai-nanoclaw-add-voice-transcription

Add Voice Transcription

This skill adds automatic voice message transcription to NanoClaw's WhatsApp channel using OpenAI's Whisper API. When a voice note arrives, it is downloaded, transcribed, and delivered to the agent as [Voice: <transcript>] .

Phase 1: Pre-flight

Check if already applied

Check if src/transcription.ts exists. If it does, skip to Phase 3 (Configure). The code changes are already in place.

Ask the user

Use AskUserQuestion to collect information:

AskUserQuestion: Do you have an OpenAI API key for Whisper transcription?

If yes, collect it now. If no, direct them to create one at https://platform.openai.com/api-keys.

Phase 2: Apply Code Changes

Prerequisite: WhatsApp must be installed first (skill/whatsapp merged). This skill modifies WhatsApp channel files.

Ensure WhatsApp fork remote

git remote -v

If whatsapp is missing, add it:

git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git

Merge the skill branch

git fetch whatsapp skill/voice-transcription git merge whatsapp/skill/voice-transcription || { git checkout --theirs package-lock.json git add package-lock.json git merge --continue }

This merges in:

  • src/transcription.ts (voice transcription module using OpenAI Whisper)

  • Voice handling in src/channels/whatsapp.ts (isVoiceMessage check, transcribeAudioMessage call)

  • Transcription tests in src/channels/whatsapp.test.ts

  • openai npm dependency in package.json

  • OPENAI_API_KEY in .env.example

If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides.

Validate code changes

npm install --legacy-peer-deps npm run build npx vitest run src/channels/whatsapp.test.ts

All tests must pass and build must be clean before proceeding.

Phase 3: Configure

Get OpenAI API key (if needed)

If the user doesn't have an API key:

I need you to create an OpenAI API key:

Cost: $0.006 per minute of audio ($0.003 per typical 30-second voice note)

Wait for the user to provide the key.

Add to environment

Add to .env :

OPENAI_API_KEY=<their-key>

Sync to container environment:

mkdir -p data/env && cp .env data/env/env

The container reads environment from data/env/env , not .env directly.

Build and restart

npm run build launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS

Linux: systemctl --user restart nanoclaw

Phase 4: Verify

Test with a voice note

Tell the user:

Send a voice note in any registered WhatsApp chat. The agent should receive it as [Voice: <transcript>] and respond to its content.

Check logs if needed

tail -f logs/nanoclaw.log | grep -i voice

Look for:

  • Transcribed voice message — successful transcription with character count

  • OPENAI_API_KEY not set — key missing from .env

  • OpenAI transcription failed — API error (check key validity, billing)

  • Failed to download audio message — media download issue

Troubleshooting

Voice notes show "[Voice Message - transcription unavailable]"

  • Check OPENAI_API_KEY is set in .env AND synced to data/env/env

  • Verify key works: curl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200

  • Check OpenAI billing — Whisper requires a funded account

Voice notes show "[Voice Message - transcription failed]"

Check logs for the specific error. Common causes:

Agent doesn't respond to voice notes

Verify the chat is registered and the agent is running. Voice transcription only runs for registered groups.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

debug

No summary provided by upstream source.

Repository SourceNeeds Review
General

add-voice-transcription

No summary provided by upstream source.

Repository SourceNeeds Review
General

update-nanoclaw

No summary provided by upstream source.

Repository SourceNeeds Review
General

add-telegram

No summary provided by upstream source.

Repository SourceNeeds Review