Web-based code samples for the Azure Speech Voice Live API featuring a shared React frontend and language-specific backends.
┌─────────────────────┐ WebSocket ┌──────────────────┐ Voice Live SDK ┌──────────────┐
│ React + Vite │◄──────────────►│ Backend Server │◄─────────────────►│ Azure Voice │
│ (shared frontend) │ JSON + PCM16 │ (Python/JS/…) │ PCM16 + events │ Live Service │
│ │ or JSON text │ │ │ │
│ Voice mode: mic │ │ │ │ │
│ Text mode: chat │ │ │ │ │
└─────────────────────┘ └──────────────────┘ └──────────────┘
The frontend builds to static files served by the backend — no separate frontend server needed in production.
- Node.js 20+ and npm (for building the frontend; also for the JavaScript backend)
- Python 3.9+ (for the Python backend)
- Java 17+ and Maven 3.8+ (for the Java backend)
- .NET 8.0 SDK (for the C# backend)
- An Azure AI Services resource with Voice Live API access
Recommended (RBAC): Use DefaultAzureCredential — no API keys required.
az login # Local development — authenticates via Azure CLIFor deployed environments, the azd infrastructure provisions a system-assigned managed identity with the Cognitive Services User role, enabling token-based auth without any keys.
Fallback (API key): Set AZURE_VOICELIVE_API_KEY in .env only if token-based auth is unavailable for your resource.
cd frontend
npm install
npm run buildThis creates frontend/dist/ with the static files that the backend will serve.
cd python
# Create and activate a virtual environment
python -m venv .venv
# Activate the venv
# Windows (PowerShell):
.venv\Scripts\Activate.ps1
# Windows (cmd):
.venv\Scripts\activate.bat
# macOS / Linux:
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txtcp .env.sample .envEdit .env with your credentials:
# Required
AZURE_VOICELIVE_ENDPOINT=https://your-resource.cognitiveservices.azure.com/
# Authentication: DefaultAzureCredential is used by default (az login).
# Set API key below only as a fallback if token auth is unavailable.
AZURE_VOICELIVE_API_KEY=
# Connection mode: "model" (default) or "agent" (Foundry Agent Service)
VOICELIVE_MODE=model
# Model mode settings (default — works with just a Foundry resource)
VOICELIVE_MODEL=gpt-realtime
VOICELIVE_VOICE=en-US-Ava:DragonHDLatestNeural
VOICELIVE_TRANSCRIBE_MODEL=gpt-4o-transcribe
# Agent mode settings (when VOICELIVE_MODE=agent)
AZURE_VOICELIVE_AGENT_NAME=your-agent-name
AZURE_VOICELIVE_PROJECT=your-project-namepython app.pyOpen http://localhost:8000 in your browser. Click Start session and allow microphone access when prompted.
cd frontend
npm install
npm run buildcd java
cp .env.sample .env
# Edit .env with your Azure Voice Live endpointmvn clean package -DskipTests
java -jar target/voice-live-universal-assistant-1.0.0.jarOr with Maven directly:
mvn spring-boot:runOpen http://localhost:8000 in your browser.
Note: See the Java backend README for environment and ecosystem notes.
cd frontend
npm install
npm run buildcd javascript
npm install
cp .env.sample .env
# Edit .env with your Azure Voice Live endpointnpm startOpen http://localhost:8000 in your browser.
cd frontend
npm install
npm run buildcd csharp
cp .env.sample .env
# Edit .env with your Azure Voice Live endpointdotnet runOpen http://localhost:8000 in your browser.
| Mode | Use case | How it works |
|---|---|---|
model |
Direct model access / BYOM (default) | Caller configures model, voice, system prompt. Works with just an endpoint — no agent setup required. Set VOICELIVE_MODEL and VOICELIVE_VOICE. |
agent |
Foundry Agent Service integration | Agent defines instructions, tools, and voice. Set AZURE_VOICELIVE_AGENT_NAME and AZURE_VOICELIVE_PROJECT. Auto-set when deploying with CREATE_AGENT=true. |
Switch modes by setting VOICELIVE_MODE in .env or via the Settings panel in the UI.
The frontend accepts URL query parameters to customize behavior without changing server configuration. These parameters control UI appearance, input mode, and pre-fill settings.
| Parameter | Values | Default | Description |
|---|---|---|---|
?mode=voice|text |
voice, text |
voice |
Lock the input mode. voice = microphone-based interaction with real-time audio. text = chat-style text input. When set, the mode toggle in the menu is hidden. |
?lock=true |
true |
false |
Kiosk/embed mode — hides the settings gear and mode toggle entirely. The app uses only server-provided configuration. |
?agent=NAME |
any string | — | Pre-fills the agent name in settings and automatically sets connection mode to agent (when used together with ?project=). |
?project=NAME |
any string | — | Pre-fills the project name in settings and automatically sets connection mode to agent (when used together with ?agent=). |
?theme=light|dark |
light, dark |
system | Override the UI theme. Without this, the app follows the system preference. |
?greeting=false |
false |
true |
Disable the proactive greeting that the agent sends when a session starts. |
Examples:
# Voice mode with agent pre-configured (typical embed scenario)
https://your-app.azurecontainerapps.io/?agent=my-agent&project=my-project&lock=true
# Text mode for chat-style interaction
https://your-app.azurecontainerapps.io/?mode=text
# Light theme, no greeting
https://your-app.azurecontainerapps.io/?theme=light&greeting=false
The text mode (?mode=text) provides a chat-style text input interface as an alternative to voice interaction.
Important: Text mode is powered by the Voice Live API's text modality, not the Foundry Agent Service REST API. The same WebSocket connection and Voice Live session are used — the only difference is that user input is sent as text messages (
send_text) instead of audio chunks. The agent's response audio can be optionally played back or muted.
Text mode features:
- Chat-style message bubbles with auto-scroll
- Audio playback toggle (speaker icon) — defaults to off; when enabled, the agent's spoken response audio plays through the browser
- Barge-in — sending a new message automatically interrupts any playing agent audio
- Text input bar with send button, positioned within the main content area
- Dismiss (✕) button to end the session
Text vs Voice comparison:
| Aspect | Voice Mode | Text Mode |
|---|---|---|
| User input | Microphone audio (PCM16) | Typed text messages |
| Agent response | Audio playback (always on) | Text bubbles (audio optional, default off) |
| Barge-in | Speaking interrupts agent | Sending text interrupts agent |
| Transport | Voice Live WebSocket | Voice Live WebSocket (same) |
| API | Voice Live text+audio modality | Voice Live text modality |
| Backend | Same session handler | Same session handler |
voice-live-universal-assistant/
├── frontend/ # Shared React + Vite + TypeScript frontend
│ ├── public/
│ │ ├── audio-capture-worklet.js # Mic capture AudioWorklet (24kHz PCM16)
│ │ └── audio-playback-worklet.js # Audio playback AudioWorklet
│ └── src/
│ ├── components/ # UI components (VoiceOrb, StartScreen, etc.)
│ ├── hooks/ # React hooks (useAudioCapture, useAudioPlayback, useVoiceSession)
│ ├── types.ts # Shared TypeScript types
│ ├── App.tsx # Root component
│ └── main.tsx # Entry point
├── python/ # Python backend (FastAPI + Voice Live SDK)
│ ├── app.py # FastAPI server with WebSocket endpoint
│ ├── voice_handler.py # VoiceLiveHandler — SDK bridge
│ ├── tests/ # 91 automated tests (settings + agent mode)
│ ├── requirements.txt # Python dependencies
│ ├── .env.sample # Environment variable template
│ └── README.md # Python-specific docs
├── java/ # Java backend (Spring Boot + Voice Live SDK)
│ ├── src/ # Spring Boot application source
│ ├── pom.xml # Maven config (azure-ai-voicelive 1.0.0-beta.5)
│ ├── .env.sample # Environment variable template
│ └── README.md # Java-specific docs
├── javascript/ # JavaScript/Node.js backend (Express + Voice Live SDK)
│ ├── app.js # Express server with WebSocket endpoint
│ ├── voiceHandler.js # VoiceHandler — SDK bridge
│ ├── package.json # npm config (@azure/ai-voicelive 1.0.0-beta.3)
│ ├── .env.sample # Environment variable template
│ └── README.md # JavaScript-specific docs
├── csharp/ # C# ASP.NET Core backend (Voice Live SDK)
│ ├── Program.cs # ASP.NET Core minimal API + WebSocket middleware
│ ├── VoiceLiveHandler.cs # VoiceLiveHandler — SDK bridge
│ ├── SessionConfig.cs # Session configuration POCO
│ ├── VoiceLiveWebApp.csproj # .NET project (Azure.AI.VoiceLive 1.1.0-beta.3)
│ ├── .env.sample # Environment variable template
│ └── README.md # C#-specific docs
├── infra/ # Azure Bicep IaC
│ ├── main.bicep # Entry point (Container Apps + optional Foundry + Agent)
│ ├── main-app.bicep # Container App with Voice Live env vars
│ ├── main-infrastructure.bicep # Log Analytics, ACR, Container Apps Env
│ ├── modules/
│ │ ├── foundry.bicep # AI Foundry account + project (optional)
│ │ └── foundry-rbac.bicep # Azure AI User role for tracing
│ └── core/host/ # Reusable modules (container-app, container-registry)
├── deployment/
│ ├── hooks/
│ │ ├── postprovision.ps1 # RBAC assignment (+ Foundry RBAC when enabled)
│ │ ├── predeploy.ps1 # ACR cloud build + Container App update
│ │ └── postdeploy.ps1 # Foundry Agent creation (when createAgent=true)
│ └── scripts/
│ └── create_agent.py # Agent creation with Voice Live metadata
├── tests/ # E2E test suite (WebSocket + Playwright)
└── README.md # This file
Deploys the web app connecting to your existing Azure AI Services resource in model mode (no agent required):
azd auth login
azd init
# Required: set your Voice Live endpoint
azd env set AZURE_VOICELIVE_ENDPOINT "https://your-resource.cognitiveservices.azure.com/"
# Optional: choose backend language (default: python)
azd env set BACKEND_LANGUAGE java # python | java | javascript | csharp
# Optional: API key (only if token auth is unavailable for your resource)
azd env set AZURE_VOICELIVE_API_KEY "your-api-key"
azd upWant agent mode instead? See Option 3 for a fully automated setup, or configure manually:
azd env set VOICELIVE_MODE agent azd env set AZURE_VOICELIVE_AGENT_NAME "your-agent-name" azd env set AZURE_VOICELIVE_PROJECT "your-project-name"
This provisions:
- Container Apps Environment with Log Analytics
- Container Registry (ACR cloud build — no local Docker required)
- Container App with system-assigned managed identity
- RBAC — Cognitive Services User for token-based auth
Provisions a new AI Foundry resource with gpt-realtime model deployment and configures the app for model mode — no additional configuration required:
azd auth login
azd init
azd env set CREATE_FOUNDRY true
# Optional: choose backend language (default: python)
azd env set BACKEND_LANGUAGE java
azd upThis adds (fully automatic — no manual endpoint/model config needed):
- AI Services Account (kind: AIServices) with system-assigned identity
- AI Foundry Project under the account
- gpt-4o-realtime-preview model deployment (as
gpt-realtime) - Azure AI User + Azure AI Developer roles
- Container App configured with provisioned endpoint + model mode
Full end-to-end: provisions Foundry, deploys GPT-4.1-mini, and creates an agent with Voice Live configuration — no additional configuration required:
azd auth login
azd init
azd env set CREATE_AGENT true
# Optional: customize agent name (default: voicelive-assistant)
azd env set AGENT_NAME "my-voice-assistant"
azd upNote:
CREATE_AGENTautomatically enablesCREATE_FOUNDRY— you don't need to set both.
This adds (fully automatic):
- Everything from Option 2
- GPT-4.1-mini model deployment (for the agent)
- Foundry Agent created via Python SDK with Voice Live session config (Azure voice, semantic VAD, noise suppression, echo cancellation)
- Container App configured with agent name, project, and agent mode
| Parameter | Default | Description |
|---|---|---|
BACKEND_LANGUAGE |
python |
Backend language: python, java, javascript, csharp |
AZURE_VOICELIVE_ENDPOINT |
— | Voice Live endpoint (required for basic, auto-set with Foundry) |
VOICELIVE_MODE |
model |
Connection mode (model by default; auto-set to agent when CREATE_AGENT=true) |
AZURE_VOICELIVE_AGENT_NAME |
— | Agent name (auto-set when CREATE_AGENT=true) |
AZURE_VOICELIVE_PROJECT |
— | Foundry project (auto-set when Foundry provisioned) |
CREATE_FOUNDRY |
false |
Create AI Foundry account + project + model |
CREATE_AGENT |
false |
Create Foundry Agent (implies CREATE_FOUNDRY; sets mode to agent) |
FOUNDRY_ACCOUNT_NAME |
auto-generated | Custom name for the AI Services account |
FOUNDRY_PROJECT_NAME |
voicelive-project |
Name for the Foundry project |
AGENT_MODEL_DEPLOYMENT_NAME |
gpt-4.1-mini |
Model deployment name for agent |
AGENT_NAME |
voicelive-assistant |
Name for the created agent |
For local development with hot-reload on both frontend and backend:
Terminal 1 — Frontend dev server (with proxy to backend):
cd frontend
npm run devTerminal 2 — Python backend:
cd python
.venv\Scripts\Activate.ps1 # or source .venv/bin/activate
python app.pyThe Vite dev server at http://localhost:5173 proxies WebSocket and API calls to http://localhost:8000.
The frontend and backend communicate over WebSocket at /ws/{clientId}.
| Direction | Message | Description |
|---|---|---|
| Client → Server | start_session |
Begin voice session with config |
| Client → Server | audio_chunk |
Base64 PCM16 mic audio (24kHz, mono) |
| Client → Server | interrupt |
Cancel current agent response |
| Client → Server | send_text |
Send text message (text mode) |
| Client → Server | stop_session |
End the session |
| Server → Client | session_started |
Session ready, includes config and session_id |
| Server → Client | audio_data |
Base64 PCM16 agent audio response |
| Server → Client | transcript |
User or assistant transcript text |
| Server → Client | status |
State change (listening/thinking/speaking) |
| Server → Client | stop_playback |
Stop audio playback (barge-in) |
| Server → Client | session_stopped |
Session ended |
| Server → Client | error |
Error message |
All backends pin the API version to 2026-01-01-preview (the SDK defaults to GA which lacks agent mode and interim response support).
| Backend | SDK | Version | Language-Specific Notes |
|---|---|---|---|
| Python | azure-ai-voicelive |
1.0.0b1 | No known limitations |
| Java | azure-ai-voicelive |
1.0.0-beta.5 | .env loaded via custom parser; Netty version mismatch warning (no runtime impact) |
| JavaScript | @azure/ai-voicelive |
1.0.0-beta.3 | Node.js 20+ required. No known limitations |
| C# | Azure.AI.VoiceLive |
1.1.0-beta.3 | No known limitations |
- Interim Response is disabled (greyed out) when a realtime model is selected in model mode — it only works with agent mode or text models using cascaded pipelines (Azure Speech transcription).
- Start Session button is disabled when in agent mode and Agent Name or Project are empty, with a helper message directing the user to Settings.
- Transcription model is auto-corrected to
azure-speechwhen a text model is selected (cascaded pipelines only supportazure-speech). - Agent mode toggle is disabled in Settings when no agent is configured on the server (agent name and project both empty). This prevents users from switching to agent mode when the server has no agent setup.
- Audio playback in text mode defaults to off. When muted, audio data from the service is completely skipped (not decoded) to avoid unnecessary resource usage.
Shows where each validation is enforced — frontend-only guards rely on the UI to prevent invalid input, while backend guards provide server-side enforcement.
| Guard | Frontend | Python | Java | JavaScript | C# |
|---|---|---|---|---|---|
| Agent mode requires name + project | ✅ Disables Start btn | — SDK validates | ✅ Falls back to model | — SDK validates | ✅ Falls back to model |
| Transcribe model auto-correction | ✅ On model change | — | — | ✅ For agent/cascaded | ✅ For cascaded models |
| Interim response disabled for realtime | ✅ Greys out toggle | — | — | — | ❌ SDK gap (ignored) |
| Session cleanup on start failure | — | ✅ | ✅ | ✅ | ✅ |
| Auth identity requires resource override | — | ✅ | ✅ | ✅ | ✅ |
Legend: ✅ = enforced, — = not needed / passed through to SDK, ❌ = not supported
| Feature | Python | Java | JavaScript | C# |
|---|---|---|---|---|
| Model mode (realtime) | ✅ | ✅ | ✅ | ✅ |
| Model mode (text/cascaded) | ✅ | ✅ | ✅ | ✅ |
| Agent mode | ✅ | ✅ | ✅ | ✅ |
| Interim response | ✅ | ✅ | ✅ | ❌ SDK gap |
| Echo cancellation | ✅ | ✅ | ✅ | ✅ |
| Noise reduction | ✅ | ✅ | ✅ | ✅ |
- Agent mode fail-fast: When
mode=agentbutagentNameorprojectNameare missing, the C# and Java backends silently fall back to model mode. The frontend already prevents this (Start button is disabled until both fields are set), but the backends should return an explicit error instead of downgrading. Python and JavaScript pass the config through and let the SDK validate. - Align backend validation guards: As shown in the Validation Guard Matrix, transcribe model auto-correction and interim response guards are inconsistent across backends. Python and Java rely entirely on the frontend for these validations. All backends should enforce the same server-side guards to prevent invalid configurations when the frontend is bypassed (e.g., direct WebSocket clients).
An E2E test suite is available in tests/e2e_all_backends.py covering all four backends with two test types:
- WebSocket tests — connect directly to the backend WebSocket endpoint, send a
start_sessionmessage, stream real WAV audio as PCM16 chunks, and verify that audio and transcript responses are received. - Playwright browser tests — open the frontend UI in a headless Chromium browser with a mocked microphone (oscillator tone), click Start, and verify the page loads, the voice orb renders, and a session becomes active.
pip install websockets playwright
python -m playwright install chromiumWebSocket tests use WAV audio files when available, or fall back to a synthetic 440Hz sine wave. Set E2E_AUDIO_DIR to a folder containing .wav files for real speech testing. Backend URLs can be overridden via environment variables (E2E_PYTHON_URL, E2E_CSHARP_URL, E2E_JAVASCRIPT_URL, E2E_JAVA_URL).
cd voice-live-universal-assistant
# All backends, both test types (model mode)
python tests/e2e_all_backends.py
# WebSocket tests only
python tests/e2e_all_backends.py --ws-only
# Playwright browser tests only
python tests/e2e_all_backends.py --browser-only
# Agent mode (default is model)
python tests/e2e_all_backends.py --mode agent
# Single backend URL
python tests/e2e_all_backends.py --url https://your-backend.azurecontainerapps.ioBackend URLs default to the development deployment and can be overridden via environment variables.
This project is licensed under the MIT License - see the LICENSE file for details.
Copyright (c) Microsoft Corporation. All rights reserved.