Hindsight

Inspiration

We've all been there—you're in an online lecture, your attention drifts for just 30 seconds, and suddenly the professor is explaining something completely different. You're lost, embarrassed to ask "what did I miss?", and now you're behind for the rest of the class.

73% of students report zoning out during online lectures. That's not a focus problem—that's a design problem. Education tools weren't built for how human attention actually works.

We asked: What if AI could watch your attention and catch you up the moment you drift?

What it does

Hindsight is an AI teaching assistant that:

Tracks your attention via webcam (detects when you look away)
Captures everything said while you're distracted using real-time transcription
Lets you recover instantly with a voice AI tutor that explains exactly what you missed

No more rewinding. No more asking classmates. Just click "Ask Hindsight" and have a conversation about the content you missed.

How we built it

┌────────────────────┬──────────────────────────────────────────┐ │ Layer │ Technology │ ├────────────────────┼──────────────────────────────────────────┤ │ Real-time Voice AI │ LiveKit Agents + LiveKit Cloud │ ├────────────────────┼──────────────────────────────────────────┤ │ Speech-to-Text │ Deepgram STT + Web Speech API │ ├────────────────────┼──────────────────────────────────────────┤ │ Text-to-Speech │ ElevenLabs │ ├────────────────────┼──────────────────────────────────────────┤ │ LLM │ Google Gemini 2.0 Flash (via OpenRouter) │ ├────────────────────┼──────────────────────────────────────────┤ │ Database │ MongoDB Atlas │ ├────────────────────┼──────────────────────────────────────────┤ │ Backend │ FastAPI (Python) │ ├────────────────────┼──────────────────────────────────────────┤ │ Frontend │ Next.js + React + Tailwind CSS │ └────────────────────┴──────────────────────────────────────────┘

Architecture:

[Student Webcam] → Attention Detection → Gap Logged [Lecture Audio] → Web Speech API → Transcripts → MongoDB [Click "Ask Hindsight"] → LiveKit Room Created → Agent Fetches Transcript → Voice Conversation

The magic happens when these pieces connect: LiveKit handles the real-time audio streaming for the voice agent, MongoDB stores timestamped transcripts, and when a student wants to recover, we query the exact time range they missed and inject it into the AI's context.

Challenges we ran into

Browser audio capture - Originally tried to capture YouTube audio directly, but browsers sandbox iframe audio. Pivoted to Web Speech API which listens via microphone.
Pydantic v2 + MongoDB - Spent hours debugging why only the first database insert worked. Turns out model_dump() was including _id: None, causing duplicate key errors on subsequent inserts.
Web Speech API quirks - Chrome's speech recognition randomly stops. Had to implement auto-restart logic with careful state management.
LiveKit Agent routing - Needed to ensure the recovery agent only joins recovery rooms, not the main classroom. Added room name filtering (recovery-* prefix).