speechall-cli
CLI for speech-to-text transcription via the Speechall API. Supports multiple providers (OpenAI, Deepgram, AssemblyAI, Google, Gemini, Groq, ElevenLabs, Cloudflare, and more).
Installation
Homebrew (macOS and Linux)
brew install Speechall/tap/speechall
Without Homebrew: Download the binary for your platform from https://github.com/Speechall/speechall-cli/releases and place it on your PATH.
Verify
speechall --version
Authentication
An API key is required. Provide it via environment variable (preferred) or flag:
export SPEECHALL_API_KEY="your-key-here"
# or
speechall --api-key "your-key-here" audio.wav
The user can create an API key on https://speechall.com/console/api-keys
Commands
transcribe (default)
Transcribe an audio or video file. This is the default subcommand — speechall audio.wav is equivalent to speechall transcribe audio.wav.
speechall <file> [options]
Options:
| Flag | Description | Default |
|---|---|---|
--model <provider.model> | STT model identifier | openai.gpt-4o-mini-transcribe |
--language <code> | Language code (e.g. en, tr, de) | API default (auto-detect) |
--output-format <format> | Output format (text, json, verbose_json, srt, vtt) | API default |
--diarization | Enable speaker diarization | off |
--speakers-expected <n> | Expected number of speakers (use with --diarization) | — |
--no-punctuation | Disable automatic punctuation | — |
--temperature <0.0-1.0> | Model temperature | — |
--initial-prompt <text> | Text prompt to guide model style | — |
--custom-vocabulary <term> | Terms to boost recognition (repeatable) | — |
--ruleset-id <uuid> | Replacement ruleset UUID | — |
--api-key <key> | API key (overrides SPEECHALL_API_KEY env var) | — |
Examples:
# Basic transcription
speechall interview.mp3
# Specific model and language
speechall call.wav --model deepgram.nova-2 --language en
# Speaker diarization with SRT output
speechall meeting.wav --diarization --speakers-expected 3 --output-format srt
# Custom vocabulary for domain-specific terms
speechall medical.wav --custom-vocabulary "myocardial" --custom-vocabulary "infarction"
# Transcribe a video file (macOS extracts audio automatically)
speechall presentation.mp4
models
List available speech-to-text models. Outputs JSON to stdout. Filters combine with AND logic.
speechall models [options]
Filter flags:
| Flag | Description |
|---|---|
--provider <name> | Filter by provider (e.g. openai, deepgram) |
--language <code> | Filter by supported language (tr matches tr, tr-TR, tr-CY) |
--diarization | Only models supporting speaker diarization |
--srt | Only models supporting SRT output |
--vtt | Only models supporting VTT output |
--punctuation | Only models supporting automatic punctuation |
--streamable | Only models supporting real-time streaming |
--vocabulary | Only models supporting custom vocabulary |
Examples:
# List all available models
speechall models
# Models from a specific provider
speechall models --provider deepgram
# Models that support Turkish and diarization
speechall models --language tr --diarization
# Pipe to jq for specific fields
speechall models --provider openai | jq '.[].identifier'
Tips
- On macOS, video files (
.mp4,.mov, etc.) are automatically converted to audio before upload. - On Linux, pass audio files directly (
.wav,.mp3,.m4a,.flac, etc.). - Output goes to stdout. Redirect to save:
speechall audio.wav > transcript.txt - Errors go to stderr, so piping stdout is safe.
- Run
speechall --help,speechall transcribe --help, orspeechall models --helpto see all valid enum values for model identifiers, language codes, and output formats.