Inspiration 🤔
When was the last time you asked your grandmother about her childhood?
Most of us assume we’ll have more time. More dinners. More holidays. More stories. But 70% of family stories disappear within three generations. Not because we don’t care, but because we don’t record them. We take thousands of photos of the people we love. Yet we rarely preserve their voice, their lessons, or the stories that shaped them. When those disappear, we don’t just lose memories. We lose culture, we lose heritage, and ultimately, we lose each other. Memory Mirror was born from a simple question: What if we could preserve the people we love - before it’s too late?
What it does 👀
Memory Mirror is an AI-powered storytelling companion that helps families record, organize, and relive meaningful life stories. Users can:
- Receive AI-generated, context-aware life questions
- Record voice responses in real time
- Automatically transcribe and structure stories
- Organize memories into a searchable digital legacy
Years from now, families won’t just read what was said, but they’ll hear how it was said. Memory Mirror transforms conversations into a preserved legacy.
How we built it 🛠️
Backend (“Brain”)
We built the backend using FastAPI (Python) to coordinate multiple AI services.
- Video Understanding: We integrated Twelve Labs (Marengo 3.0) to analyze video content semantically. This allows users to search memories using natural language queries like “that day at the park,” without manually tagging clips.
- Narrative Generation: We used IBM watsonx (mistral-small) to generate first-person narrative summaries based on video content. The model is prompted to interpret footage and produce structured, human-readable memory stories.
- Voice Cloning & Synthesis: ElevenLabs handles voice synthesis. We use timestamp metadata to synchronize spoken audio with on-screen transcript highlighting.
- Conversational Layer: Google Gemini generates follow-up questions and contextual prompts to help users continue exploring their memories.
The backend manages data flow between video analysis, text generation, speech synthesis, and search indexing.
Frontend (“Mirror”)
The frontend is built with Next.js 16 and Tailwind CSS 4.
Key features include:
- Real-time speech recognition using a custom hook built on the browser’s Speech API, enabling low-latency transcript display.
- Audio visualization that reacts dynamically during playback.
- Framer Motion animations for smooth UI transitions.
- Synchronized transcript highlighting, aligning spoken audio with word-level text timing.
The goal was to make the interaction feel responsive and immersive, not like a static media player.
Challenges we ran into 😿
Real-Time Responsiveness: Initial speech-to-text implementation introduced noticeable delays. We had to use interim recognition results from the browser’s speech API to make transcription feel immediate.
Coordinating Multiple AI Services: We’re orchestrating four different AI providers. Latency varied between services (video analysis, narrative generation, speech synthesis), so we had to structure the backend asynchronously to avoid blocking the user experience.
Audio–Text–Video Synchronization: Aligning generated speech with transcript highlighting and video playback was technically complex. Buffer timing issues and playback drift required careful debugging and timestamp handling.
Dependency Management: Using cutting-edge frameworks (Next.js 16, modern tooling) introduced compatibility issues with Node.js versions across environments. Resolving those inconsistencies costs us development time.
Accomplishments that we're proud of 💯
- Building a full end-to-end pipeline: video → semantic understanding → narrative → voice synthesis → synchronized playback
- Enabling natural language search over personal video memories
- Achieving tight synchronization between AI-generated speech and transcript highlighting
- Successfully integrating multiple best-in-class AI tools into a unified system
Most importantly, we created something that feels interactive rather than static, more like engaging with a memory than just replaying media.
What we learned 💭
- Specialized AI models outperform general-purpose models for domain-specific tasks (e.g., video understanding).
- When multiple AI calls are involved, user experience depends heavily on how loading states and async flows are managed.
- Voice adds a strong emotional layer - synthesized narration significantly changes how users experience memories.
- Small UX details (like real-time transcription or synchronized highlighting) dramatically affect perceived quality.
What's next for Memory Mirror 🚀
- Lip-synced generative video to animate the “Mirror” during playback
- Proactive memory resurfacing, where the system suggests relevant memories based on context
- Live memory streaming, enabling real-time capture and indexing from mobile devices
- Collaborative family archives, where multiple family members can annotate and expand shared memories
- Our long-term goal is to make personal memory preservation structured, searchable, and interactive, not just stored.
Built With
- chromadb
- elevenlabs
- gemini
- nextjs
- python
- react
- supabase
- three.js
- twelvelabs
- typescript
- watsonx
Log in or sign up for Devpost to join the conversation.