Inspiration

Memory loss affects millions of people with Alzheimer's and Dementia, creating a gap between their daily experiences and what they can recall. We were inspired to build a solution that acts as a digital memory assistant - capturing live events in real time, helping you remember the ones you love, and never having to question your identity or livelihood again. Our goal was to help individuals maintain their sense of identity and independence by giving them another memory system and another chance at autonomy, connection, and meaning.

What it does

Recall acts as a memory assistant that passively translates frames of significant events captured throughout your day into contextually relevant summarizations to paint a timeline of your day.

Recall also helps you never forget the ones you love. Running a facial recognition software, we detect if your loved ones are nearby and create live-streamed, audible transcriptions of the conversation into structured conversation summaries linked to individual profiles

We are aimed to integrate as seamlessly and unobtrusively as possible into a user's life. Dementia should not strip you of your independence and dignity, so we eliminate the burden of manual note-taking and constant reminders, and act as a unique, personalized memory journal that helps you and your loved ones never have to answer the question "What did I do today?" or "Who are you?" again.

How we built it

Recall was built using two main hardware components, a Raspberry Pi v 3 Model B and a Logitech Brio web-camera. We flashed a new image of a linux OS onto the Raspberry Pi and due to the tiny nature of the device, we ensured to offload heavy processing by creating our own custom deployed endpoints (using ngrok), for rigorous facial detection and environment awareness. For speech understanding, we utilize the built-in microphone from the Logitech camera and have an algorithm that understands when a user is talking with a significant person. We then use OpenAI’s whisper for transcription and grok-3 for summarization. Furthermore, we use wired earbuds in order to send audio messages to the user. Finally, all created timelines and events are stored and displayed on a NextJS app (deployed on Vercel). This allows users to go through detailed, informative timelines of a day’s events and help recall important details from surroundings and/or interactions.

Challenges we ran into

Since this was all of our first times working with hardware, it took us a long time to get the Raspberry Pi set up. We needed to replace the SD card (from 8GB to 32GB) because we ran out of memory when installing heavier machine-learning python libraries. We also needed to re-flash the OS and enable SSH so that our project could work wirelessly. There was also an issue with wifi that made the SSH connection unstable, so we had to rely on a hotspot for all programming on the Raspberry Pi. Additionally, we didn’t want to save every image and audio files due to privacy concerns. We successfully mitigated this by using captioning conversation summaries which do not require any long-term storage of data. Even through all these situations, we were able to work together and build a meaningful project that delivers real value.

Accomplishments that we're proud of

*Recall has the impact to change the lives of millions. *

Fully autonomous operation: The system can run 24/7 without user intervention, intelligently deciding when to record based on visual cues and audio cues. No buttons to press, no apps to open, Recall works silently in the background, so users can live their lives without thinking about their memory loss.

Prioritized facial recognition: Our system doesn't just detect faces - it recognizes familiar faces. Recall distinguishes between strangers and loved ones, only initiating conversation recording when someone meaningful enters the frame. We capture interactions with family members and close friends (people who you choose are important to you) while respecting the privacy of passersby or delivery workers. By building individual profiles for each recognized person, we create a rich relational context: users can see not just what happened, but who they spent time with, helping them maintain connections even when faces become hard to remember.

Smart audio capture: Our VAD implementation with silence detection ensures we capture complete conversations without wasting storage on insignificant

Scalable Architecture : Direct MongoDB integration and efficient frame processing make this viable to handle massive amounts of users

Privacy-conscious design: All video processing happens in temporary batches - none of your livestream data is saved, only transcribed and processed. This ensures your privacy, and those around you are respected at all times.

What we learned

Throughout this project, we built a complete multi-modal AI assistive memory system integrating Raspberry Pi 3 hardware with a Logitech webcam for real-time facial recognition, Flask REST APIs for backend services, MongoDB for storing 128-dimensional face embeddings and timeline events, and AI services including OpenAI Whisper for transcription, Grok for video/audio summarization, and ElevenLabs for text-to-speech with the Matilda voice. We mastered end-to-end system architecture - from edge computing (offloading embedding extraction to the Pi) to API orchestration (coordinating 5 different services) to implementing cosine similarity search for face matching. Key technical skills gained include handling multipart file uploads in Flask, designing MongoDB schemas for ML applications, prompt engineering for concise AI outputs, managing API keys securely with .env files, debugging platform-specific issues (Python 3.14 + ARM Mac compatibility), and building modular codebases that separate concerns across hardware integration, database operations, and AI service calls

What's next for Recall

Moving forward, we plan to make Recall more seamless, intelligent, and scalable. Our next steps include integrating lightweight on-device models optimized for the Raspberry Pi to reduce dependency on remote servers and improve offline reliability. Another focus is expanding multimodal understanding and enabling Recall to interpret not just speech, but emotional tone and contextual cues, helping it respond more compassionately and naturally. Finally, with access to better hardware, we hope to reduce the form factor of Recall to make it more convenient to use in day-to-day life.

Built With

+ 4 more
Share this project:

Updates