Real-time multilingual interpretation for audio conversations using 100% on-device AI
Most Valuable Feedback Winner - Google Chrome Built-in AI Challenge 2025
- Award
- π A Hitchhiker's Tale (The Babel Fish)
- β¨ Key Features
- π οΈ Tech Stack
- π Getting Started
- π How It Works
β οΈ Limitations- π§ͺ Testing
- π Documentation
- π API Documentation
- ποΈ Project Structure
- π License
BabelGopher breaks down language barriers in real-time video conferencing by combining:
- LiveKit WebRTC for peer-to-peer audio communication
- Web Speech API for speech-to-text transcription
- Chrome Translation API for on-device translation
- Web Speech Synthesis for natural voice output
All AI processing happens on-device in your browser - no cloud APIs, no external servers for AI.
"...If you stick a Babel fish in your ear... you can instantly understand anything said to you in any form of language."
β The Hitchhikerβs Guide to the Galaxy
BabelGopher is our privacy-first, terrestrial take on the Babel fish. It helps you understand anyone in real timeβwhile keeping the conversation content on your device. The audio may zip through a LiveKit SFU for speed, but the thinking (STT, translation, and TTS) happens locally. No galactic cloud AI peeking at your thoughts.
- Result: Seamless conversation.
- Twist: Your translation data stays yours.
- Bonus: No towel required (though a good mic helps avoid sounding like a malfunctioning robot).
- π€ Real-time Speech-to-Text - Continuous transcription using Web Speech API
- π Instant Translation - 10+ languages with Chrome's built-in AI
- π Natural Voice Output - Text-to-speech in your preferred language
- π¬ Live Subtitles - Side-by-side original + translated text
- ποΈ User Controls - Language selection, TTS toggle, subtitle toggle
- π₯ Multi-Participant - Support for multiple participants via LiveKit
- π Privacy-First - All AI processing on-device, no cloud uploads (because your words are yours, not some AI overlord's)
- β‘ Low Latency - Typical pipeline: <500ms from speech to translated audio
- Phase 1: Monorepo setup, LiveKit audio rooms, participant management
- Phase 2: Chrome AI STT & translation, multi-lang support, TTS with echo cancellation
- Phase 3: Subtitle UI, language controls, accessibility (WCAG AA), responsive design
- LiveKit - WebRTC for real-time audio communication (the audio highway)
- Web Speech API - Browser-native speech recognition (STT) β because robots need ears too
- Chrome Translation API - On-device translation (with Prompt API fallback) β privacy's best friend
- Web Speech Synthesis - Browser-native text-to-speech (TTS) β making AI sound less robotic
- Next.js 15 - React framework with App Router
- React 19 - UI library
- TypeScript - Type safety
- Tailwind CSS - Styling
- LiveKit Client SDK - WebRTC client
- Next.js API Routes - JWT token generation for LiveKit
- livekit-server-sdk - Server-side token signing
- React Context API - Global state
- Custom hooks - Modular logic (useSTT, useTranslation, useTTS, etc.)
- Chrome Canary or Edge Dev (required for Chrome Built-in AI)
- Node.js 18+ and pnpm 9.x
- LiveKit Account - Get free tier at cloud.livekit.io
BabelGopher requires Chrome Canary with experimental AI features enabled.
Quick Setup:
- Install Chrome Canary
- Open
chrome://flags - Enable these flags:
#translation-apiβ Enabled#prompt-api-for-gemini-nanoβ Enabled#optimization-guide-on-device-modelβ Enabled BypassPerfRequirement
- Restart browser
- Verify: Open DevTools console, type
'ai' in window && 'translation' in windowβ should returntrue
Detailed setup guide: See docs/CHROME_AI_SETUP.md
# Install pnpm if you don't have it
npm install -g pnpm
# Install dependencies (from project root)
pnpm installcd apps/web
cp .env.example .env.localEdit .env.local with your LiveKit credentials:
# Get these from https://cloud.livekit.io/
LIVEKIT_API_KEY=your_api_key_here
LIVEKIT_API_SECRET=your_api_secret_here
LIVEKIT_URL=wss://your-project.livekit.cloudcd apps/web
pnpm dev- Open http://localhost:3000 in Chrome Canary
- Enter your name and a room code
- Select your preferred output language
- Click "Join Conference"
- Grant microphone permission when prompted
- Start speaking - watch real-time transcription and translation!
Open a second tab or window and join the same room with a different name.
Note: Only your own speech is transcribed (Web Speech API limitation). See Limitations below.
Your Microphone
β
βΌ
βββββββββββββββ
β LiveKit β Publishes audio to room
β (WebRTC) β
ββββββββ¬βββββββ
β
βΌ
ββββββββββββββββββββ
β Speech Recognitionβ Continuous transcription
β (Web Speech API) β
ββββββββββ¬βββββββββββ
β
βΌ "Hello world"
ββββββββββββββββββββ
β Translation β Chrome Translation API
β (Chrome AI) β
ββββββββββ¬ββββββββββ
β
βΌ "μλ
νμΈμ"
ββββββββββββββββββββ
β Subtitles + β Display + speak
β TTS Output β
ββββββββββββββββββββ
-
LiveKit Connection (
useLiveKit.ts)- Connects to LiveKit room with JWT token
- Publishes local audio track
- Receives remote participants' audio
-
Speech-to-Text (
useSTT.ts)- Captures microphone input via Web Speech API
- Continuous recognition with auto-restart
- Emits final transcription results
-
Translation (
useTranslation.ts)- Receives transcription from STT
- Translates to user's target language
- Uses Chrome Translation API (or Prompt API fallback)
-
Text-to-Speech (
useTTS.ts)- Speaks translated text
- Uses Web Speech Synthesis
- Auto-selects voice for target language
-
Orchestration (
useConferenceOrchestrator.ts)- Coordinates entire pipeline
- Manages state synchronization
- Handles errors and capability checking
- Context API for global state
- Custom hooks for each feature
- React Reducer pattern for complex state updates
- localStorage for persisted preferences
- Architecture - Technical architecture and design
- Chrome AI Setup - Detailed Chrome setup guide
- Troubleshooting - Common issues and solutions
-
Local Participant Only
- Only your own speech is transcribed
- Web Speech API limitation (requires getUserMedia microphone)
- Workaround: Each participant transcribes their own speech (future: sync via LiveKit data channel)
-
Browser Requirements
- Requires Chrome Canary or Edge Dev
- Chrome Built-in AI features are experimental
- Not available in Firefox or Safari
-
Language Detection
- STT language currently hardcoded to English
- Planned: Auto-detect speaking language
-
Performance
- CPU-intensive on-device processing
- Recommended: 8GB+ RAM
- Best with 2-4 participants
-
Translation Quality
- Depends on Chrome's on-device models
- Some language pairs may be less accurate
- Long sentences may fail
- LiveKit Data Channel - Sync transcriptions between participants
- Language Auto-Detection - Detect speaking language automatically
- Server-Side STT - Optional cloud transcription for remote participants
- Video Support - Currently audio-only
- Mobile App - React Native implementation
- Export History - Save conversation transcripts
POST /auth-livekit-token
Generate a LiveKit access token for joining a room.
Request Body:
{
"user_identity": "unique-user-id",
"room_name": "room-name"
}Success Response (200 OK):
{
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
}Error Responses:
400 Bad Request: Missing required fields500 Internal Server Error: Token generation failed
Example:
curl -X POST http://localhost:8080/auth-livekit-token \
-H "Content-Type: application/json" \
-d '{"user_identity": "user123", "room_name": "my-room"}'CORS: Configured to allow all origins for development
GET /health
Check server health status.
Success Response (200 OK):
{
"status": "healthy"
}Required for backend operation:
| Variable | Description | Example |
|---|---|---|
LIVEKIT_API_KEY |
LiveKit Cloud API key | APIxxxxxxx |
LIVEKIT_API_SECRET |
LiveKit Cloud API secret | xxxxxxxxxxxxx |
LIVEKIT_URL |
LiveKit WebSocket URL | wss://project.livekit.cloud |
PORT |
Server port (optional) | 8080 |
Get your LiveKit credentials from: https://cloud.livekit.io/
Quick Checks:
# Type check
cd apps/web
pnpm tsc
# Build check
pnpm buildBackend Tests:
cd apps/server
go test -v ./...Frontend Manual Testing:
- Start both backend and frontend servers (see Development section)
- Open http://localhost:3000
- Enter name: "Alice", room: "test-room"
- Click "Join Room"
- Open second browser tab
- Enter name: "Bob", room: "test-room"
- Verify both participants see each other in the participant list
Testing Checklist:
- Health endpoint:
curl http://localhost:8080/health - Token endpoint:
curl -X POST http://localhost:8080/auth-livekit-token -H "Content-Type: application/json" -d '{"user_identity":"test","room_name":"test"}' - Frontend lobby form validation
- LiveKit connection success/failure handling
- Multi-participant room join/leave
STT Testing (Chrome Canary Required):
- Follow Chrome AI Setup instructions above
- Join a room with 2+ participants
- Verify "Chrome AI Available" status shows in room
- Speak into your microphone
- Verify real-time transcriptions appear in the transcription panel
- Test with multiple participants speaking
- Verify participant names and timestamps display correctly
- Test "Clear Transcriptions" button
Translation Testing (Chrome Canary Required):
- Follow Chrome AI Setup instructions above
- Join a room and start speaking (or have another participant speak)
- Verify "Translation Available" status shows in room
- Select target language from dropdown (e.g., Korean, Japanese, Spanish)
- Observe automatic translation of transcriptions (with 300ms debounce)
- Verify both original and translated text display correctly
- Test with multiple language pairs (EnglishβKorean, KoreanβEnglish, etc.)
- Verify translation latency is acceptable (<500ms)
- Test "Clear All" button to clear both transcriptions and translations
- Verify language preference persists after page reload
PoC Test Suite:
- See
apps/web/src/poc/README.mdfor detailed PoC testing instructions - Run automated test suite to validate Chrome AI integration
- Generate test reports for debugging
babelgopher/
βββ apps/
β βββ web/ # Next.js frontend
β βββ server/ # Go backend
βββ docs/ # Documentation
βββ packages/ # Shared packages
MIT License - see LICENSE
Most Valuable Feedback Winner at Google Chrome Built-in AI Challenge 2025

