BabelGopher 🌐

Real-time multilingual interpretation for audio conversations using 100% on-device AI

Award

Most Valuable Feedback Winner - Google Chrome Built-in AI Challenge 2025

Overview

BabelGopher breaks down language barriers in real-time video conferencing by combining:

LiveKit WebRTC for peer-to-peer audio communication
Web Speech API for speech-to-text transcription
Chrome Translation API for on-device translation
Web Speech Synthesis for natural voice output

All AI processing happens on-device in your browser - no cloud APIs, no external servers for AI.

🐟 A Hitchhiker’s Tale (The Babel Fish)

"...If you stick a Babel fish in your ear... you can instantly understand anything said to you in any form of language."

— The Hitchhiker’s Guide to the Galaxy

BabelGopher is our privacy-first, terrestrial take on the Babel fish. It helps you understand anyone in real time—while keeping the conversation content on your device. The audio may zip through a LiveKit SFU for speed, but the thinking (STT, translation, and TTS) happens locally. No galactic cloud AI peeking at your thoughts.

Result: Seamless conversation.
Twist: Your translation data stays yours.
Bonus: No towel required (though a good mic helps avoid sounding like a malfunctioning robot).

✨ Key Features

🎤 Real-time Speech-to-Text - Continuous transcription using Web Speech API
🌍 Instant Translation - 10+ languages with Chrome's built-in AI
🔊 Natural Voice Output - Text-to-speech in your preferred language
💬 Live Subtitles - Side-by-side original + translated text
🎛️ User Controls - Language selection, TTS toggle, subtitle toggle
👥 Multi-Participant - Support for multiple participants via LiveKit
🔒 Privacy-First - All AI processing on-device, no cloud uploads (because your words are yours, not some AI overlord's)
⚡ Low Latency - Typical pipeline: <500ms from speech to translated audio

What's Been Built So Far

Phase 1: Monorepo setup, LiveKit audio rooms, participant management
Phase 2: Chrome AI STT & translation, multi-lang support, TTS with echo cancellation
Phase 3: Subtitle UI, language controls, accessibility (WCAG AA), responsive design

🛠️ Tech Stack

Core Technologies

LiveKit - WebRTC for real-time audio communication (the audio highway)
Web Speech API - Browser-native speech recognition (STT) – because robots need ears too
Chrome Translation API - On-device translation (with Prompt API fallback) – privacy's best friend
Web Speech Synthesis - Browser-native text-to-speech (TTS) – making AI sound less robotic

Frontend

Next.js 15 - React framework with App Router
React 19 - UI library
TypeScript - Type safety
Tailwind CSS - Styling
LiveKit Client SDK - WebRTC client

Backend (Minimal)

Next.js API Routes - JWT token generation for LiveKit
livekit-server-sdk - Server-side token signing

State Management

React Context API - Global state
Custom hooks - Modular logic (useSTT, useTranslation, useTTS, etc.)

🚀 Getting Started

Prerequisites

Chrome Canary or Edge Dev (required for Chrome Built-in AI)
Node.js 18+ and pnpm 9.x
LiveKit Account - Get free tier at cloud.livekit.io

1. Chrome AI Setup

BabelGopher requires Chrome Canary with experimental AI features enabled.

Quick Setup:

Install Chrome Canary
Open chrome://flags
Enable these flags:
- #translation-api → Enabled
- #prompt-api-for-gemini-nano → Enabled
- #optimization-guide-on-device-model → Enabled BypassPerfRequirement
Restart browser
Verify: Open DevTools console, type 'ai' in window && 'translation' in window → should return true

Detailed setup guide: See docs/CHROME_AI_SETUP.md

2. Install Dependencies

# Install pnpm if you don't have it
npm install -g pnpm

# Install dependencies (from project root)
pnpm install

3. Configure Environment Variables

cd apps/web
cp .env.example .env.local

Edit .env.local with your LiveKit credentials:

# Get these from https://cloud.livekit.io/
LIVEKIT_API_KEY=your_api_key_here
LIVEKIT_API_SECRET=your_api_secret_here
LIVEKIT_URL=wss://your-project.livekit.cloud

4. Start Development Server

cd apps/web
pnpm dev

5. Open in Browser

Open http://localhost:3000 in Chrome Canary
Enter your name and a room code
Select your preferred output language
Click "Join Conference"
Grant microphone permission when prompted
Start speaking - watch real-time transcription and translation!

6. Test with Multiple Participants

Open a second tab or window and join the same room with a different name.

Note: Only your own speech is transcribed (Web Speech API limitation). See Limitations below.

📖 How It Works

Pipeline Architecture

Your Microphone
      │
      ▼
┌─────────────┐
│  LiveKit    │  Publishes audio to room
│  (WebRTC)   │
└──────┬──────┘
       │
       ▼
┌──────────────────┐
│ Speech Recognition│  Continuous transcription
│  (Web Speech API) │
└────────┬──────────┘
         │
         ▼ "Hello world"
┌──────────────────┐
│   Translation    │  Chrome Translation API
│  (Chrome AI)     │
└────────┬─────────┘
         │
         ▼ "안녕하세요"
┌──────────────────┐
│   Subtitles +    │  Display + speak
│   TTS Output     │
└──────────────────┘

Component Flow

LiveKit Connection (useLiveKit.ts)
- Connects to LiveKit room with JWT token
- Publishes local audio track
- Receives remote participants' audio
Speech-to-Text (useSTT.ts)
- Captures microphone input via Web Speech API
- Continuous recognition with auto-restart
- Emits final transcription results
Translation (useTranslation.ts)
- Receives transcription from STT
- Translates to user's target language
- Uses Chrome Translation API (or Prompt API fallback)
Text-to-Speech (useTTS.ts)
- Speaks translated text
- Uses Web Speech Synthesis
- Auto-selects voice for target language
Orchestration (useConferenceOrchestrator.ts)
- Coordinates entire pipeline
- Manages state synchronization
- Handles errors and capability checking

State Management

Context API for global state
Custom hooks for each feature
React Reducer pattern for complex state updates
localStorage for persisted preferences

📚 Documentation

Architecture - Technical architecture and design
Chrome AI Setup - Detailed Chrome setup guide
Troubleshooting - Common issues and solutions

⚠️ Limitations

Current Limitations

Local Participant Only
- Only your own speech is transcribed
- Web Speech API limitation (requires getUserMedia microphone)
- Workaround: Each participant transcribes their own speech (future: sync via LiveKit data channel)
Browser Requirements
- Requires Chrome Canary or Edge Dev
- Chrome Built-in AI features are experimental
- Not available in Firefox or Safari
Language Detection
- STT language currently hardcoded to English
- Planned: Auto-detect speaking language
Performance
- CPU-intensive on-device processing
- Recommended: 8GB+ RAM
- Best with 2-4 participants
Translation Quality
- Depends on Chrome's on-device models
- Some language pairs may be less accurate
- Long sentences may fail

Planned Enhancements

LiveKit Data Channel - Sync transcriptions between participants
Language Auto-Detection - Detect speaking language automatically
Server-Side STT - Optional cloud transcription for remote participants
Video Support - Currently audio-only
Mobile App - React Native implementation
Export History - Save conversation transcripts

API Documentation

Authentication Endpoint

POST /auth-livekit-token

Generate a LiveKit access token for joining a room.

Request Body:

{
  "user_identity": "unique-user-id",
  "room_name": "room-name"
}

Success Response (200 OK):

{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
}

Error Responses:

400 Bad Request: Missing required fields
500 Internal Server Error: Token generation failed

Example:

curl -X POST http://localhost:8080/auth-livekit-token \
  -H "Content-Type: application/json" \
  -d '{"user_identity": "user123", "room_name": "my-room"}'

CORS: Configured to allow all origins for development

Health Check Endpoint

GET /health

Check server health status.

Success Response (200 OK):

{
  "status": "healthy"
}

Environment Variables

Required for backend operation:

Variable	Description	Example
`LIVEKIT_API_KEY`	LiveKit Cloud API key	`APIxxxxxxx`
`LIVEKIT_API_SECRET`	LiveKit Cloud API secret	`xxxxxxxxxxxxx`
`LIVEKIT_URL`	LiveKit WebSocket URL	`wss://project.livekit.cloud`
`PORT`	Server port (optional)	`8080`

Get your LiveKit credentials from: https://cloud.livekit.io/

🧪 Testing

Quick Checks:

# Type check
cd apps/web
pnpm tsc

# Build check
pnpm build

Backend Tests:

cd apps/server
go test -v ./...

Frontend Manual Testing:

Start both backend and frontend servers (see Development section)
Open http://localhost:3000
Enter name: "Alice", room: "test-room"
Click "Join Room"
Open second browser tab
Enter name: "Bob", room: "test-room"
Verify both participants see each other in the participant list

Testing Checklist:

Health endpoint: curl http://localhost:8080/health
Token endpoint: curl -X POST http://localhost:8080/auth-livekit-token -H "Content-Type: application/json" -d '{"user_identity":"test","room_name":"test"}'
Frontend lobby form validation
LiveKit connection success/failure handling
Multi-participant room join/leave

STT Testing (Chrome Canary Required):

Follow Chrome AI Setup instructions above
Join a room with 2+ participants
Verify "Chrome AI Available" status shows in room
Speak into your microphone
Verify real-time transcriptions appear in the transcription panel
Test with multiple participants speaking
Verify participant names and timestamps display correctly
Test "Clear Transcriptions" button

Translation Testing (Chrome Canary Required):

Follow Chrome AI Setup instructions above
Join a room and start speaking (or have another participant speak)
Verify "Translation Available" status shows in room
Select target language from dropdown (e.g., Korean, Japanese, Spanish)
Observe automatic translation of transcriptions (with 300ms debounce)
Verify both original and translated text display correctly
Test with multiple language pairs (English→Korean, Korean→English, etc.)
Verify translation latency is acceptable (<500ms)
Test "Clear All" button to clear both transcriptions and translations
Verify language preference persists after page reload

PoC Test Suite:

See apps/web/src/poc/README.md for detailed PoC testing instructions
Run automated test suite to validate Chrome AI integration
Generate test reports for debugging

Project Structure

babelgopher/
├── apps/
│   ├── web/      # Next.js frontend
│   └── server/   # Go backend
├── docs/         # Documentation
└── packages/     # Shared packages

License

MIT License - see LICENSE

Hackathon Submission

Most Valuable Feedback Winner at Google Chrome Built-in AI Challenge 2025

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
apps		apps
.gitignore		.gitignore
BabelGopher.png		BabelGopher.png
LICENSE		LICENSE
README.md		README.md
index.html		index.html
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml

License

Glitch-Jar/BabelGopher

Folders and files

Latest commit

History

Repository files navigation

BabelGopher 🌐

Award

Table of Contents

Overview

🐟 A Hitchhiker’s Tale (The Babel Fish)

✨ Key Features

What's Been Built So Far

🛠️ Tech Stack

Core Technologies

Frontend

Backend (Minimal)

State Management

🚀 Getting Started

Prerequisites

1. Chrome AI Setup

2. Install Dependencies

3. Configure Environment Variables

4. Start Development Server

5. Open in Browser

6. Test with Multiple Participants

📖 How It Works

Pipeline Architecture

Component Flow

State Management

📚 Documentation

⚠️ Limitations

Current Limitations

Planned Enhancements

API Documentation

Authentication Endpoint

Health Check Endpoint

Environment Variables

🧪 Testing

Project Structure

License

Hackathon Submission

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages