Inspiration
We've all been in that moment, you meet someone at a networking event, they introduce themselves, but their name is gone. You can't remember it at all. Or maybe you promised your colleague you'd follow up on something, but it slips through the cracks the moment you walk out of the room. We built what the Meta Glasses don't have: a memory layer, a second brain that remembers the small but important things that your actual brain doesn't always remember. It's not an app you have to open, it's not a voice assistant you have to ask. It's something that just watches, listens, and remembers passively in the background.
What it does
Jarvis is an AI life assistant that runs on Meta Ray-Ban glasses. It sees who you're talking to, listens to your conversations, and builds a living knowledge graph of your world - automatically
Facial Recognition: the glasses stream video in real time, recognizing familiar faces and adding faces of people you have met and conversed with by extracting their name from conversation
Contextual Reminders: When you talk to someone and mention a task, Jarvis extracts it and surfaces it the next time you see that person.
Knowledge Graph: Every conversation you make is processed by Claude, which extracts people, tasks, relationships, and sentiment, writing them into a Neo4j graph. You can visualize your entire social and professional world in a live force-directed graph on the dashboard. There's no need to remember conversations, names, to do tasks, etc.
Zero Friction: Recording starts automatically when a face appears, stops when they leave. No buttons, no wake words.
Jarvis: Say "Hey Jarvis" anytime to ask for reminders or tasks, recall past conversations, query your knowledge graph ("who did I meet at the conference?"), or get a summary of what you talked about with someone
How we built it
iOS (Swift/SwiftUI) - wraps the Meta Wearables DAT SDK to stream video frames from the glasses and capture audio via the iPhone mic. A WebSocket client sends frames and PCM audio chunks to the backend in real-time.
Backend (FastAPI/Python) - the core AI pipeline. - InsightFace for facial recognition and embedding comparison. - OpenAI Whisper for audio transcription, with an RMS-based gate to filter background noise before the API call - Claude for reminder extraction and structured knowledge graph entity/relationship extraction - Neo4j for the persistent knowledge graph
Frontend (Next.js) - a real time dashboard that visualizes the knowledge graph, with live updates pushed over websockets whenever a new conversation is processed.
Challenges we ran into
Meta Glasses SDK: Very new, very buggy, very limited
Whisper hallucinations: in noisy hackathon environments, Whisper could potentially hallucinate phrases. Solved with a RMS speech gate.
Accomplishments that we're proud of
End-to-end auto-enrollment: a stranger walks up, has a conversation, says their name, and is automatically added to the face database with no manual steps
A live knowledge graph that updates in real-time mid-conversation
The entire pipeline - glasses -> phone -> backend -> graph -> frontend, running under 300ms latency per frame
What we learned
- The Meta Wearables DAT SDK is powerful, but has flaws. It has a lot of potential.
- Designing for real-world noise is completely different from lab conditions, almost every threshold we set in testing had to be re-tuned at the venue
- Streaming AI pipelines require careful back pressure management, audio buffers, frame throttling, and async processing all need to be coordinated or you get cascading latency
What's next for SAM
- Calendar and email integration: draft up emails to people based on previous conversations, add tasks to calendar, upcoming meeting with recognized people
- Linkedin Integration: Connect with people on Linkedin based on conversations.
Built With
- claude
- docker
- graphql
- insightface
- neo4j
- next.js
- python
- swift
- websockets
- whisper

Log in or sign up for Devpost to join the conversation.