Inspiration
In an emergency, every second counts. Imagine trying to get to safety, just to realize that you can't communicate with the only people that can help you. Millions of refugees and immigrants deal with this every year. Just a language barrier can lead to so many problems. The solution? A fast, seamless way to instantly translate conversations, ensuring both sides can understand each other clearly and effortlessly.
What it does
LinguaAid is a real-time, bidirectional translation app designed to eliminate communication friction between first responders and refugees. The app seamlessly integrates into conversations using a simple push-to-talk interface (or: by automatically detecting speakers via Speaker Diarization). It first transcribes and performs low-latency translation between the two parties. Critically, it then leverages Large Language Models (LLMs) to synthesize and highlight key takeaways for the first responder, such as medical conditions, safety status, or intended destination, ensuring no vital details are lost.
How we built it
We implemented a low-latency stack integrating two APIs, ElevenLabs and Gemini. The application runs on a minimal, high-speed frontend designed for mobile use by first responders. The ElevenLabs API was used for both real-time Speech-to-Text (STT) transcription and natural-sounding Text-to-Speech (TTS) response generation, ensuring a human-like, rapid conversational flow. The translation layer, is powered by the Gemini 2.5 Flash API. This model was chosen for its optimal balance of speed and complex reasoning in a single API structure.
Challenges we ran into
Working with ElevenLabs was challenging as none of us had prior experience with it. Integrating its output with the contextual synthesis feature of Gemini required efficient processing to ensure the end-to-end delay remained low enough for a natural conversation.
Accomplishments that we're proud of
We successfully engineered a multi-API solution that achieves more than simple parallel translation, more capable than simple systems like Google Translate. We effectively utilized the text to speech, and speech to text capabilities of ElevenLabs coupled with Gemini's ability to translate to build a solution to our original problem.
What we learned
We gained significant, hands-on experience in building LLM-powered agents for real-time, latency-sensitive applications, specifically learning best practices for minimizing API round-trip times. We learnt this through hands on execution and optimization of Gemini API and ElevenLabs API
What's next for LinguaAid
We plan to implement a a Case Notes Summary Dashboard that automatically saves and organizes critical data from the translated conversation. This feature generates a frictionless digital handoff, ensuring continuity of care when the case is transferred to long-term personnel, such as a specialized doctor or case manager, preventing vital information from being lost or repeated.
Log in or sign up for Devpost to join the conversation.