Memory Mirror

the landing page
features
login
the ui to interact with our ai to request stories and videos
hear the voice of your loved one and rewatch their memories
similarity embedding result from video analysis
see how a person's experiences relate to eachother in 3d space
see conversation history

Inspiration 🤔

When was the last time you asked your grandmother about her childhood?

Most of us assume we’ll have more time. More dinners. More holidays. More stories. But 70% of family stories disappear within three generations. Not because we don’t care, but because we don’t record them. We take thousands of photos of the people we love. Yet we rarely preserve their voice, their lessons, or the stories that shaped them. When those disappear, we don’t just lose memories. We lose culture, we lose heritage, and ultimately, we lose each other. Memory Mirror was born from a simple question: What if we could preserve the people we love - before it’s too late?

What it does 👀

Memory Mirror is an AI-powered storytelling companion that helps families record, organize, and relive meaningful life stories. Users can:

Receive AI-generated, context-aware life questions
Record voice responses in real time
Automatically transcribe and structure stories
Organize memories into a searchable digital legacy

Years from now, families won’t just read what was said, but they’ll hear how it was said. Memory Mirror transforms conversations into a preserved legacy.

How we built it 🛠️

Backend (“Brain”)

We built the backend using FastAPI (Python) to coordinate multiple AI services.

Video Understanding: We integrated Twelve Labs (Marengo 3.0) to analyze video content semantically. This allows users to search memories using natural language queries like “that day at the park,” without manually tagging clips.
Narrative Generation: We used IBM watsonx (mistral-small) to generate first-person narrative summaries based on video content. The model is prompted to interpret footage and produce structured, human-readable memory stories.
Voice Cloning & Synthesis: ElevenLabs handles voice synthesis. We use timestamp metadata to synchronize spoken audio with on-screen transcript highlighting.
Conversational Layer: Google Gemini generates follow-up questions and contextual prompts to help users continue exploring their memories.

The backend manages data flow between video analysis, text generation, speech synthesis, and search indexing.

Frontend (“Mirror”)

The frontend is built with Next.js 16 and Tailwind CSS 4.

Key features include:

Real-time speech recognition using a custom hook built on the browser’s Speech API, enabling low-latency transcript display.
Audio visualization that reacts dynamically during playback.
Framer Motion animations for smooth UI transitions.
Synchronized transcript highlighting, aligning spoken audio with word-level text timing.

The goal was to make the interaction feel responsive and immersive, not like a static media player.

Challenges we ran into 😿

Real-Time Responsiveness: Initial speech-to-text implementation introduced noticeable delays. We had to use interim recognition results from the browser’s speech API to make transcription feel immediate.
Coordinating Multiple AI Services: We’re orchestrating four different AI providers. Latency varied between services (video analysis, narrative generation, speech synthesis), so we had to structure the backend asynchronously to avoid blocking the user experience.
Audio–Text–Video Synchronization: Aligning generated speech with transcript highlighting and video playback was technically complex. Buffer timing issues and playback drift required careful debugging and timestamp handling.
Dependency Management: Using cutting-edge frameworks (Next.js 16, modern tooling) introduced compatibility issues with Node.js versions across environments. Resolving those inconsistencies costs us development time.

Accomplishments that we're proud of 💯

Building a full end-to-end pipeline: video → semantic understanding → narrative → voice synthesis → synchronized playback
Enabling natural language search over personal video memories
Achieving tight synchronization between AI-generated speech and transcript highlighting
Successfully integrating multiple best-in-class AI tools into a unified system

Most importantly, we created something that feels interactive rather than static, more like engaging with a memory than just replaying media.

What we learned 💭

Specialized AI models outperform general-purpose models for domain-specific tasks (e.g., video understanding).
When multiple AI calls are involved, user experience depends heavily on how loading states and async flows are managed.
Voice adds a strong emotional layer - synthesized narration significantly changes how users experience memories.
Small UX details (like real-time transcription or synchronized highlighting) dramatically affect perceived quality.

What's next for Memory Mirror 🚀

Lip-synced generative video to animate the “Mirror” during playback
Proactive memory resurfacing, where the system suggests relevant memories based on context
Live memory streaming, enabling real-time capture and indexing from mobile devices
Collaborative family archives, where multiple family members can annotate and expand shared memories
Our long-term goal is to make personal memory preservation structured, searchable, and interactive, not just stored.