Heirloom

Front Page
Logo

Inspiration

Families record more moments than ever, yet most of those recordings are never revisited. Videos sit in cloud drives, voice notes live in chat apps, and photos are scattered across devices. When someone is no longer around, those fragments become difficult to navigate and impossible to interact with.

We wanted to change how family history is preserved and experienced. Rather than creating another digital archive, we set out to build something people could talk to - while remaining honest about what is real and what is not. Heirloom is driven by the belief that memory should feel human, but it should also be verifiable. Every response is grounded in the original recordings, with exact citations back to the source material.

How we built it

Heirloom is built around a clear separation of concerns: extraction, grounding, and interaction.

Users upload existing family media - video, audio, images, or text - with a strict 100MB limit to ensure fast, reliable processing. Once a file exists in storage, our backend uses Google Gemini to analyse the content and extract structured “memory units”. These represent real events or moments, each with a title, summary, places, dates, and precise timestamps for audio and video.

For interaction, we use a voice-first interface. When a user asks a question, Gemini generates a response strictly from the extracted memories, and ElevenLabs synthesises the answer in the original person’s voice. Crucially, while the answer plays, the interface displays citations showing the exact clip and timestamp where that information came from. Users can open the source and see or hear the evidence themselves.

The system is designed to avoid hallucination. If a detail is not present in the uploaded media, it is not invented. Our frontend is built with Next.js, Tailwind, and Framer Motion to keep the experience calm and intuitive, while the backend uses FastAPI with asynchronous jobs to handle long-running media analysis safely.

Challenges

The hardest challenge was balancing emotional impact with technical honesty. It is easy to make an AI sound convincing; it is much harder to make it accountable. Ensuring every answer could be traced back to a real moment in the source material required careful schema design, timestamp handling, and strict prompting.

Processing long audio and video reliably within a hackathon timeframe was also challenging. We had to design an extraction pipeline that could segment content meaningfully, retry safely on failures, and avoid duplicating data when jobs were re-run.

Latency was another concern. For the experience to feel like a conversation, responses needed to arrive quickly. We optimised the flow by pre-processing memories, limiting context size, and streaming voice output as soon as text became available.

What we learned

We learned that trust is the most important part of AI-driven storytelling. Users are far more comfortable engaging emotionally when they can see the proof behind what the system says. Showing citations alongside voice responses changed how people reacted - they listened more closely and questioned less.

We also learned that small, well-defined constraints help creativity. Limiting upload size, keeping event types controlled, and enforcing grounded answers allowed us to move faster and build something more reliable within the hackathon.

Most importantly, we learned that AI can support memory without replacing it. When used carefully, it can help families reconnect with real voices, real moments, and real experiences - without pretending to be something it is not.

Built With

amazon-web-services
elevenlabs
fastapi
gemini
nextjs
postgresql
python
react
s3-compatible-storage
supabase
swr
tailwindcss
typescript

Submitted to

Royal Hackaway v9
- Winner [MLH] Best Use of ElevenLabs

Created by

I worked on the data retrieval system, consisting of querying the database for matching keywords present in the question asked. I also worked on the question and answer system which compiles all information related to the question, requests Gemini to compose the answer and returns the answer and the source file content. I also worked on the elevenlabs implementation of voice cloning and the text-to-speech narration of the answer.

Khush Mehta
I worked on the data extraction system, including allowing uploads of files of different types and communicating with Gemini to produce meaningful data as well as the frontend to provide an immersive user experience. I also worked on the pipeline, which includes Supabase and S3.

Ethan Olchik

Updates

Khush Mehta started this project — Feb 01, 2026 03:44 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.