whisper-voice

Native macOS menu bar app for live voice-to-text with auto-type using WhisperKit on Apple Silicon

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "whisper-voice" with this command: npx skills add aiagentwithdhruv/skills/aiagentwithdhruv-skills-whisper-voice

Whisper Voice — Live Speech-to-Text Mac App

Goal

Build and run a native macOS menu bar app that captures live microphone audio, transcribes it offline using WhisperKit (on Apple Silicon), and auto-types the text wherever the cursor is.

Inputs

NameTypeRequiredDescription
model_sizestringNoWhisper model: tiny, base (default), small
languagestringNo"en" (default) or "hi" for Hindi mode
chunk_durationfloatNoSeconds per audio chunk (default: 3.0)

Process

1. Build the app

cd AiwithDhruv_Voice/WhisperAiwithDhruv
swift build

2. Run the app

swift run WhisperAiwithDhruv
# Or open in Xcode: open Package.swift → Cmd+R

3. First launch setup

  1. Grant microphone permission when prompted
  2. Grant Accessibility in System Settings → Privacy → Accessibility
  3. Wait for model download (~140MB for base model)

4. Usage

  • Cmd+Shift+Space — Toggle recording on/off
  • Click mic icon in menu bar for controls
  • Speak — text auto-types at cursor position
  • Toggle Hindi mode for Hindi/Hinglish input

Outputs

NameTypeDescription
transcribed_textstringLive transcribed text typed at cursor
historyarrayLast 50 transcription entries in menu bar

Edge Cases

  • No mic: Shows error in menu bar dropdown
  • Accessibility denied: Auto-type disabled, manual copy from history
  • Silence: VAD skips silent chunks (energy-based threshold)
  • Hallucinations: Filters common Whisper artifacts ("Thank you.", "...")
  • Model not downloaded: Shows download progress bar

Environment

  • macOS 14+ (Sonoma)
  • Apple Silicon (M1/M2/M3/M4)
  • Xcode 15+ (for building)
  • No API keys needed (fully offline)

Schema

Inputs

NameTypeRequiredDescription
model_sizestringNotiny / base / small
languagestringNoen / hi
chunk_durationfloatNo2.0 - 8.0 seconds
silence_thresholdfloatNo0.002 - 0.05

Outputs

NameTypeDescription
transcriptionstringLive text output
auto_typedbooleanWhether text was injected at cursor

Credentials

NameSource
NoneFully offline, no API keys

Composable With

video-edit (add transcription captions), send-telegram (send transcriptions to phone)

Cost

Free — runs entirely on-device. Model download is one-time (~140MB for base).

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

image-to-video

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

gmaps-leads

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

excalidraw-visuals

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

video-edit

No summary provided by upstream source.

Repository SourceNeeds Review