@azure/ai-voicelive (JavaScript/TypeScript)

Real-time voice AI SDK for building bidirectional voice assistants with Azure AI in Node.js and browser environments.

Installation

npm install @azure/ai-voicelive @azure/identity

TypeScript users

npm install @types/node

Current Version: 1.0.0-beta.3

Supported Environments:

Node.js LTS versions (20+)
Modern browsers (Chrome, Firefox, Safari, Edge)

Environment Variables

AZURE_VOICELIVE_ENDPOINT=https://<resource>.cognitiveservices.azure.com

Optional: API key if not using Entra ID

AZURE_VOICELIVE_API_KEY=<your-api-key>

Optional: Logging

AZURE_LOG_LEVEL=info

Authentication

Microsoft Entra ID (Recommended)

import { DefaultAzureCredential } from "@azure/identity"; import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential(); const endpoint = "https://your-resource.cognitiveservices.azure.com";

const client = new VoiceLiveClient(endpoint, credential);

API Key

import { AzureKeyCredential } from "@azure/core-auth"; import { VoiceLiveClient } from "@azure/ai-voicelive";

const endpoint = "https://your-resource.cognitiveservices.azure.com"; const credential = new AzureKeyCredential("your-api-key");

const client = new VoiceLiveClient(endpoint, credential);

Client Hierarchy

VoiceLiveClient └── VoiceLiveSession (WebSocket connection) ├── updateSession() → Configure session options ├── subscribe() → Event handlers (Azure SDK pattern) ├── sendAudio() → Stream audio input ├── addConversationItem() → Add messages/function outputs └── sendEvent() → Send raw protocol events

Quick Start

import { DefaultAzureCredential } from "@azure/identity"; import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential(); const endpoint = process.env.AZURE_VOICELIVE_ENDPOINT!;

// Create client and start session const client = new VoiceLiveClient(endpoint, credential); const session = await client.startSession("gpt-4o-mini-realtime-preview");

// Configure session await session.updateSession({ modalities: ["text", "audio"], instructions: "You are a helpful AI assistant. Respond naturally.", voice: { type: "azure-standard", name: "en-US-AvaNeural", }, turnDetection: { type: "server_vad", threshold: 0.5, prefixPaddingMs: 300, silenceDurationMs: 500, }, inputAudioFormat: "pcm16", outputAudioFormat: "pcm16", });

// Subscribe to events const subscription = session.subscribe({ onResponseAudioDelta: async (event, context) => { // Handle streaming audio output const audioData = event.delta; playAudioChunk(audioData); }, onResponseTextDelta: async (event, context) => { // Handle streaming text process.stdout.write(event.delta); }, onInputAudioTranscriptionCompleted: async (event, context) => { console.log("User said:", event.transcript); }, });

// Send audio from microphone function sendAudioChunk(audioBuffer: ArrayBuffer) { session.sendAudio(audioBuffer); }

Session Configuration

await session.updateSession({ // Modalities modalities: ["audio", "text"],

// System instructions instructions: "You are a customer service representative.",

// Voice selection voice: { type: "azure-standard", // or "azure-custom", "openai" name: "en-US-AvaNeural", },

// Turn detection (VAD) turnDetection: { type: "server_vad", // or "azure_semantic_vad" threshold: 0.5, prefixPaddingMs: 300, silenceDurationMs: 500, },

// Audio formats inputAudioFormat: "pcm16", outputAudioFormat: "pcm16",

// Tools (function calling) tools: [ { type: "function", name: "get_weather", description: "Get current weather", parameters: { type: "object", properties: { location: { type: "string" } }, required: ["location"] } } ], toolChoice: "auto", });

Event Handling (Azure SDK Pattern)

The SDK uses a subscription-based event handling pattern:

// Session events onSessionCreated: async (event, context) => { console.log("Session created:", context.sessionId); }, onSessionUpdated: async (event, context) => { console.log("Session updated"); },

// Audio input events (VAD) onInputAudioBufferSpeechStarted: async (event, context) => { console.log("Speech started at:", event.audioStartMs); }, onInputAudioBufferSpeechStopped: async (event, context) => { console.log("Speech stopped at:", event.audioEndMs); },

// Transcription events onConversationItemInputAudioTranscriptionCompleted: async (event, context) => { console.log("User said:", event.transcript); }, onConversationItemInputAudioTranscriptionDelta: async (event, context) => { process.stdout.write(event.delta); },

// Response events onResponseCreated: async (event, context) => { console.log("Response started"); }, onResponseDone: async (event, context) => { console.log("Response complete"); },

// Streaming text onResponseTextDelta: async (event, context) => { process.stdout.write(event.delta); }, onResponseTextDone: async (event, context) => { console.log("\n--- Text complete ---"); },

// Streaming audio onResponseAudioDelta: async (event, context) => { const audioData = event.delta; playAudioChunk(audioData); }, onResponseAudioDone: async (event, context) => { console.log("Audio complete"); },

// Audio transcript (what assistant said) onResponseAudioTranscriptDelta: async (event, context) => { process.stdout.write(event.delta); },

// Function calling onResponseFunctionCallArgumentsDone: async (event, context) => { if (event.name === "get_weather") { const args = JSON.parse(event.arguments); const result = await getWeather(args.location);

  await session.addConversationItem({
    type: "function_call_output",
    callId: event.callId,
    output: JSON.stringify(result),
  });
  
  await session.sendEvent({ type: "response.create" });
}

// Catch-all for debugging onServerEvent: async (event, context) => { console.log("Event:", event.type); }, });

// Clean up when done await subscription.close();

Function Calling

// Define tools in session config await session.updateSession({ modalities: ["audio", "text"], instructions: "Help users with weather information.", tools: [ { type: "function", name: "get_weather", description: "Get current weather for a location", parameters: { type: "object", properties: { location: { type: "string", description: "City and state or country", }, }, required: ["location"], }, }, ], toolChoice: "auto", });

// Handle function calls const subscription = session.subscribe({ onResponseFunctionCallArgumentsDone: async (event, context) => { if (event.name === "get_weather") { const args = JSON.parse(event.arguments); const weatherData = await fetchWeather(args.location);

  // Send function result
  await session.addConversationItem({
    type: "function_call_output",
    callId: event.callId,
    output: JSON.stringify(weatherData),
  });
  
  // Trigger response generation
  await session.sendEvent({ type: "response.create" });
}

}, });

Voice Options

Voice Type Config Example

Azure Standard { type: "azure-standard", name: "..." }

"en-US-AvaNeural"

Azure Custom { type: "azure-custom", name: "...", endpointId: "..." }

Custom voice endpoint

Azure Personal { type: "azure-personal", speakerProfileId: "..." }

Personal voice clone

OpenAI { type: "openai", name: "..." }

"alloy" , "echo" , "shimmer"

Supported Models

Model Description Use Case

gpt-4o-realtime-preview

GPT-4o with real-time audio High-quality conversational AI

gpt-4o-mini-realtime-preview

Lightweight GPT-4o Fast, efficient interactions

phi4-mm-realtime

Phi multimodal Cost-effective applications

Turn Detection Options

// Server VAD (default) turnDetection: { type: "server_vad", threshold: 0.5, prefixPaddingMs: 300, silenceDurationMs: 500, }

// Azure Semantic VAD (smarter detection) turnDetection: { type: "azure_semantic_vad", }

// Azure Semantic VAD (English optimized) turnDetection: { type: "azure_semantic_vad_en", }

// Azure Semantic VAD (Multilingual) turnDetection: { type: "azure_semantic_vad_multilingual", }

Audio Formats

Format Sample Rate Use Case

pcm16

24kHz Default, high quality

pcm16-8000hz

8kHz Telephony

pcm16-16000hz

16kHz Voice assistants

g711_ulaw

8kHz Telephony (US)

g711_alaw

8kHz Telephony (EU)

Key Types Reference

Type Purpose

VoiceLiveClient

Main client for creating sessions

VoiceLiveSession

Active WebSocket session

VoiceLiveSessionHandlers

Event handler interface

VoiceLiveSubscription

Active event subscription

ConnectionContext

Context for connection events

SessionContext

Context for session events

ServerEventUnion

Union of all server events

Error Handling

import { VoiceLiveError, VoiceLiveConnectionError, VoiceLiveAuthenticationError, VoiceLiveProtocolError, } from "@azure/ai-voicelive";

const subscription = session.subscribe({ onError: async (args, context) => { const { error } = args;

if (error instanceof VoiceLiveConnectionError) {
  console.error("Connection error:", error.message);
} else if (error instanceof VoiceLiveAuthenticationError) {
  console.error("Auth error:", error.message);
} else if (error instanceof VoiceLiveProtocolError) {
  console.error("Protocol error:", error.message);
}

onServerError: async (event, context) => { console.error("Server error:", event.error?.message); }, });

Logging

import { setLogLevel } from "@azure/logger";

// Enable verbose logging setLogLevel("info");

// Or via environment variable // AZURE_LOG_LEVEL=info

Browser Usage

// Browser requires bundler (Vite, webpack, etc.) import { VoiceLiveClient } from "@azure/ai-voicelive"; import { InteractiveBrowserCredential } from "@azure/identity";

// Use browser-compatible credential const credential = new InteractiveBrowserCredential({ clientId: "your-client-id", tenantId: "your-tenant-id", });

const client = new VoiceLiveClient(endpoint, credential);

// Request microphone access const stream = await navigator.mediaDevices.getUserMedia({ audio: true }); const audioContext = new AudioContext({ sampleRate: 24000 });

// Process audio and send to session // ... (see samples for full implementation)

Best Practices

Always use DefaultAzureCredential — Never hardcode API keys
Set both modalities — Include ["text", "audio"] for voice assistants
Use Azure Semantic VAD — Better turn detection than basic server VAD
Handle all error types — Connection, auth, and protocol errors
Clean up subscriptions — Call subscription.close() when done
Use appropriate audio format — PCM16 at 24kHz for best quality

Reference Links

Resource URL

npm Package https://www.npmjs.com/package/@azure/ai-voicelive

GitHub Source https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-voicelive

Samples https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-voicelive/samples

API Reference https://learn.microsoft.com/javascript/api/@azure/ai-voicelive

azure-ai-voicelive-ts

Safety Notice

Copy this and send it to your AI assistant to learn

TypeScript users

Optional: API key if not using Entra ID

Optional: Logging

Source Transparency

Related Skills

github-issue-creator

azure-observability

azure-appconfiguration-java

copilot-sdk