soniox

Integrate Soniox speech-to-text API for real-time and async transcription. Use when working with Soniox WebSocket streaming, Soniox REST transcription, Soniox SDKs (Python, Node, Web), or implementing voice/speech features using Soniox. Triggers on mentions of Soniox API, soniox package, @soniox/node, @soniox/speech-to-text-web, real-time transcription via Soniox, speech-to-text integration, or audio transcription with Soniox.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "soniox" with this command: npx skills add bbssppllvv/essential-skills/bbssppllvv-essential-skills-soniox

Soniox Speech-to-Text

Cloud speech-to-text API with real-time WebSocket streaming and async file transcription. Supports 60+ languages, speaker diarization, live translation, and custom vocabulary context.

API Structure

Two main APIs:

APITransportModelUse Case
Real-TimeWebSocket wss://stt-rt.soniox.com/transcribe-websocketstt-rt-v4Live audio streaming, token-by-token results
AsyncREST https://api.soniox.com/v1/stt-async-v4Pre-recorded files, batch processing

Authentication: Authorization: Bearer <API_KEY> header (or api_key query param for WebSocket).

Quick Start — Real-Time WebSocket

  1. Connect to wss://stt-rt.soniox.com/transcribe-websocket?api_key=YOUR_KEY
  2. Send JSON config: {"model": "stt-rt-v4", "audio_format": "pcm_s16le", "sample_rate": 16000}
  3. Stream raw audio bytes
  4. Receive JSON tokens: {"tokens": [{"text": "hello", "is_final": true, "start_ms": 100, "end_ms": 500}]}
  5. Close connection when done

Key token fields: text, is_final (false=provisional, true=confirmed), start_ms, end_ms, confidence, speaker (if diarization enabled), language (if language ID enabled).

Quick Start — Async REST

# Upload and transcribe
curl -X POST https://api.soniox.com/v1/transcriptions \
  -H "Authorization: Bearer $API_KEY" \
  -F model=stt-async-v4 \
  -F audio_file=@recording.mp3

# Poll for result
curl https://api.soniox.com/v1/transcriptions/{id} \
  -H "Authorization: Bearer $API_KEY"

Configuration Options (Both APIs)

Common parameters sent in start config (real-time) or request body (async):

ParameterTypeDescription
modelstringstt-rt-v4 or stt-async-v4
language_hintsstring[]ISO 639-1 codes to improve accuracy
language_hints_strictboolRestrict recognition to hinted languages
enable_language_identificationboolDetect language per token
enable_speaker_diarizationboolLabel speakers (up to 15)
translationobjectTranslation config: {"type": "one_way", "target_language": "fr"} or {"type": "two_way", "language_a": "en", "language_b": "fr"}
contextobjectDomain context (see below)
max_endpoint_delay_msint500-3000ms, semantic endpoint detection (real-time only)

Context Object Format

{
  "context": {
    "general": [
      {"key": "domain", "value": "Healthcare"},
      {"key": "topic", "value": "Medical Consultation"}
    ],
    "text": "Background: Patient discussing cardiac symptoms...",
    "terms": ["myocardial infarction", "stent", "angioplasty"],
    "translation_terms": [
      {"source": "stent", "target": "стент"}
    ]
  }
}

Max 8000 tokens.

Reference Files

Read these based on the specific task:

FileWhen to Read
references/realtime.mdWebSocket protocol details, token streaming, finalization, keepalive, error codes
references/async-api.mdREST endpoints, file upload, job polling, webhooks, file management
references/features.mdLanguages list, diarization details, context format, models, timestamps
references/sdks.mdPython/Node/Web SDK usage, code patterns, client initialization
references/integrations.mdDirect/Proxy stream patterns, Vercel AI, TanStack, Twilio, n8n, data residency, security

Native Swift/macOS Integration

Soniox has no native Swift SDK. For macOS/iOS apps, connect via raw WebSocket:

// URLSessionWebSocketTask approach
let url = URL(string: "wss://stt-rt.soniox.com/transcribe-websocket?api_key=\(apiKey)")!
let task = URLSession.shared.webSocketTask(with: url)
task.resume()

// Send start config
let config = """
{"model":"stt-rt-v4","audio_format":"pcm_s16le","sample_rate":16000}
"""
task.send(.string(config)) { error in /* handle */ }

// Stream audio bytes from microphone
task.send(.data(audioBuffer)) { error in /* handle */ }

// Receive tokens
func receiveNext() {
    task.receive { result in
        switch result {
        case .success(.string(let json)):
            // Parse tokens from JSON
            break
        case .failure(let error):
            // Handle error
            break
        default: break
        }
        receiveNext() // Continue receiving
    }
}

Audio format: Send raw PCM signed 16-bit little-endian at 16kHz mono for best results. The API also auto-detects encoded formats (mp3, ogg, flac, wav, etc.).

Rate Limits

LimitReal-TimeAsync
Requests/min100100
Concurrent10 connections100 pending jobs
Max duration300 min/session
Storage10GB, 1000 files
Total transcriptions2000

Data Residency

Regional endpoints available:

RegionReal-Time EndpointAsync Endpoint
US (default)stt-rt.soniox.comapi.soniox.com
EUstt-rt.eu.soniox.comapi.eu.soniox.com
Japanstt-rt-jp.soniox.comapi.jp.soniox.com

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

product-design

No summary provided by upstream source.

Repository SourceNeeds Review
General

polar-integration

No summary provided by upstream source.

Repository SourceNeeds Review
General

openrouter

No summary provided by upstream source.

Repository SourceNeeds Review