Inspiration

We built Rosetta because the classroom should reward intelligence, not bilingual stamina. Imagine sitting in a lecture where the words are familiar but the meaning isn’t. That’s the everyday reality for nearly 6 million international and ESL students every day. It used to be a reality for many of our parents. They’re mentally translating instead of learning, missing interaction windows, and burning energy reconstructing lectures later. Rosetta is our attempt to turn that story around. It brings live, natural-sounding translation, lecture-notes-aware citations, and automatically generated study-ready notes into one polished web experience so students can focus on ideas instead of translation.

What it does

Rosetta runs inside the browser during a live lecture and produces three things at once: translated audio in the student’s native language, a live transcript enriched with in-line citations to uploaded course materials, and post-lecture structured notes. The translation keeps the pacing and tone, so students can listen in real time without juggling two languages. As the lecture progresses, Rosetta highlights relevant pages from PDFs and other course files inline with the transcript, making it easy to follow up on a concept without interrupting the flow. When class ends, a single click generates clean, topic-organized notes, fully referenced and exportable for offline study.

Why Rosetta Stands Out

This is not just a demo with mocked data. Rosetta is a usable product: folders and sessions, real file uploads, live citations, question translation (type in your language and play the English aloud), full website translations, and polished note exports. It directly addresses equity in education. That alignment makes it a strong candidate for Best Minority Hack as well as Best Accessbility Hack. Because ElevenLabs powers the natural, low-latency voice output that makes follow-along actually comfortable, Rosetta is also an ideal fit for Best Use of ElevenLabs.

Technical highlights

We optimized for latency and reliability rather than just raw accuracy. The browser captures audio, produces a transcript, and routes the text into a low-latency translation + text-to-speech flow so a student hears translated speech within perceptible real-time. At the same time, the transcript triggers semantic retrieval from uploaded course materials, so citations surface with minimal delay. Those retrieval decisions were engineered to be fast and defensible: local keyword enrichment, compact embeddings, and a lightweight re-ranker let us push citation latency below the perceptual threshold where it feels “instant.”

Challenges

The hardest technical challenge was making retrieval-augmented generation fast enough to feel invisible during a live lecture. Our first RAG pipeline was architecturally sound but practically too slow. We used a conservative 2–3 sentence sliding window for each transcript chunk, assuming more context would improve retrieval quality. In practice, this introduced unnecessary noise and compounded latency: queries became bloated, embeddings were slower to compute, and irrelevant citations occasionally surfaced seconds late, breaking the illusion of real-time assistance.

We pivoted to a tighter, single-sentence window triggered at natural sentence boundaries. This change forced us to be more disciplined about retrieval quality, but it paid off. Shorter queries were easier to enrich locally, faster to embed, and more semantically precise. Combined with aggressive local optimization, replacing LLM-based query enrichment with KeyBERT, switching to lightweight local embeddings, adding early-exit thresholds, and downsizing the re-ranker, we reduced end-to-end citation latency from nearly a second to well under 150 milliseconds.

What’s next

As we refined Rosetta with mentor feedback, our next steps became clear and tightly scoped around institutional readiness. First is data security and trust. Instructors often share materials that cannot enter consumer AI systems, so Rosetta supports two deployment paths: enterprise-grade providers with contractual data isolation or a fully open-source, in-tenant setup where all translation, retrieval, and citation run on institutional infrastructure. The choice here is policy-driven and not technical. Second is classroom audio robustness. Live environments are noisy, and we’ve identified edge cases like background chatter and speaker feedback loops. The next iteration would strengthen echo cancellation, voice activity detection, and noise suppression, with system-level guards that ignore speaker output while preserving quiet professor speech. Finally, we’re expanding beyond text-only PDFs. Real courses use slides, diagrams, scans, and media, so we’re adding OCR, slide-aware parsing, and mp3/mp4 support to make citation retrieval work seamlessly across all course materials.

Built With

+ 9 more
Share this project:

Updates