inspiration...

We noticed that a lot of our international friends frequently use LLMs like Google Gemini or ChatGPT to translate PDFs, or can't understand their instructors clearly due to language barriers. This led us to build gatorlator, a Chrome extension that enable ESL students to listen to lectures in their native language using DeepL for translation and ElevenLabs' voice models for TTS. It works straight from your browser, no web or desktop app needed, making it seamless and convenient for students!

what it does ~

This extension will allow users to turn on real-time audio translation, outputting subtitles and audio in the user's desired language.

how we built it ~

Building the Live Translator Extension required careful coordination across frontend, API integration, and Chrome extension architecture. We navigated a steep learning curve, vetting various technologies before solidifying a core stack that balanced performance with real-time reliability.

phase 1: team discussion & planning We first mapped out our vision: real-time browser audio translation with both audio output and subtitles. After brainstorming, we settled on a three-step AI pipeline: Speech-to-Text (Deepgram) Translation (DeepL) Text-to-Speech (ElevenLabs)

phase 2: initial setup ~Set up project structure with folders for services, scripts, popup, and offscreen components ~Initialized Chrome extension with a basic manifest.json declaring necessary permissions (tabCapture, offscreen, storage, activeTab) ~Created .env file structure for API key management

phase 3: building the API service layer (Hours 4-8) Deepgram API Integration: Our first challenge was capturing and transcribing browser audio. We built: I. Audio capture functionality using MediaRecorder API II. Integration with Deepgram for speech-to-text conversion III. Support for multiple audio formats (WebM, Opus) DeepL API I. Designed prompts to ensure clean, translation-only responses II. Added multi-language support with dynamic target language selection III. Implemented fallback error handling ElevenLabs Voice Synthesis I. Integrated ElevenLabs API for hyper-realistic voice generation II. Configured voice settings (stability, similarity boost) for natural speech III. Converted audio responses to base64 for Chrome extension compatibility IV. Tested various voices to find the most natural-sounding option for each language

phase 4: chrome extension core architecture Service Worker Development (background.js): Brain of our extension, orchestrating all components: I. Implemented tab audio capture using chrome.tabCapture.capture() II. Built continuous audio recording in 5-second chunks for real-time processing III. Created the translation pipeline coordinator that chains Deepgram → DeepL→ ElevenLabs IV. Developed message passing system to communicate between background, popup, content scripts, and offscreen document V. Added state management to track translation status and user preferences Offscreen Document (offscreen/offscreen.html & offscreen.js) I. Created hidden HTML page that exists purely for audio playback II. Built message listener to receive base64 audio data from background script III. Implemented Audio API to decode and play translated speech

phase 5: user interface development Extension Popup (popup/popup.html, popup.js, popup.css): designed an intuitive control panel: I. Start/Stop translation toggle button II. Language selector dropdown with 20+ supported languages III. Real-time status indicator (active/inactive with visual feedback) IV. Settings panel for customizing voice and subtitle preferences V. Clean, modern UI using CSS flexbox and animations Subtitle System (scripts/content.js, scripts/subtitles.js, subtitles.css): I. Content script injection into active web pages II. Dynamic subtitle container positioned at bottom-center of viewport III. Text shadow and background for readability over any video content IV. Responsive design to work across different screen sizes

phase 6: integration & testing Component Integration I. Ensured proper message flow: popup → background → offscreen → content II. Synchronized audio playback with subtitle display timing III. Tested the complete pipeline with various video sources (YouTube, Netflix, streaming sites) IV. Debugged race conditions where subtitles appeared before/after audio V. Tested with multiple languages (Spanish, French, Mandarin, Japanese) VI. Identified and fixed audio quality issues VII. Optimized chunk size (settled on 5 seconds as the sweet spot)

phase 7: polish & debugging Bug Fixes I. Fixed issue where extension would break on page navigation II. Resolved audio de-synchronization problems III. Corrected subtitle positioning on full-screen videos IV. Handled API key validation and user-friendly error messages UI/UX Refinements I. Added loading states and progress indicators II. Improved button feedback and hover states III. Created custom icon and branding Documentation I. Created README with setup instructions II. Documented API key requirements III. Added inline code comments for future maintenance

challenges we ran into ~

  1. Brainstorming - Coming up with the idea for the extension and the proper services required took us almost 2-3 hours.
  2. Chrome Extension Audio Limitations - Service workers can't play audio directly, requiring the offscreen document workaround
  3. Real-Time Processing - Balancing chunk size for responsiveness vs. accuracy
  4. API Rate Limits - Managing three different API services with varying rate limits
  5. Subtitle Synchronization - Timing subtitles perfectly with translated audio output
  6. Cross-Browser Audio Capture - Different websites handle audio differently, requiring robust error handling

accomplishments that we're proud of ~

  1. Successfully Chaining Three AI Models in Real-Time: building a seamless pipeline that coordinates Deepgram, DeepL, and ElevenLabs—three separate AI services—to work together in near real-time. Getting these APIs to communicate efficiently while maintaining low latency (under 8 seconds from audio capture to translated output) was a major technical achievement.
  2. Solving Chrome Extension Audio Challenges: Chrome's service worker architecture presented unique constraints that most developers never encounter. We implemented an offscreen document workaround to enable audio playback and continuous recording to tackle this problem.
  3. Creating a Accessible Tool: We built something that genuinely helps people break down language barriers. It is a tool that could help international students, language learners, and multicultural families.
  4. Subtitle Synchronization Perfection: Getting subtitles to appear synchronized with the translated audio output required solving complex timing challenges across multiple asynchronous processes.
  5. Open Source Contribution: We're proud to make this project open source, providing detailed documentation that will help other developers learn from our work. Our codebase serves as a comprehensive example of Chrome extension development, AI API integration, and real-time audio processing.

what we learned ~

This project taught us the complexity of real-time audio processing, the intricacies of Chrome extension architecture, and the power of combining multiple AI services. We learned to handle asynchronous operations at scale, debug across multiple execution contexts, and create seamless user experiences despite technical constraints. Most importantly, we learned that building accessible technology for international communication is both challenging and incredibly rewarding.

so what's next?

More lectures and more languages :)

Built With

Share this project:

Updates