Jarvis

photo

Inspiration

We've all been in that moment, you meet someone at a networking event, they introduce themselves, but their name is gone. You can't remember it at all. Or maybe you promised your colleague you'd follow up on something, but it slips through the cracks the moment you walk out of the room. We built what the Meta Glasses don't have: a memory layer, a second brain that remembers the small but important things that your actual brain doesn't always remember. It's not an app you have to open, it's not a voice assistant you have to ask. It's something that just watches, listens, and remembers passively in the background.

What it does

Jarvis is an AI life assistant that runs on Meta Ray-Ban glasses. It sees who you're talking to, listens to your conversations, and builds a living knowledge graph of your world - automatically

Facial Recognition: the glasses stream video in real time, recognizing familiar faces and adding faces of people you have met and conversed with by extracting their name from conversation
Contextual Reminders: When you talk to someone and mention a task, Jarvis extracts it and surfaces it the next time you see that person.
Knowledge Graph: Every conversation you make is processed by Claude, which extracts people, tasks, relationships, and sentiment, writing them into a Neo4j graph. You can visualize your entire social and professional world in a live force-directed graph on the dashboard. There's no need to remember conversations, names, to do tasks, etc.
Zero Friction: Recording starts automatically when a face appears, stops when they leave. No buttons, no wake words.
Jarvis: Say "Hey Jarvis" anytime to ask for reminders or tasks, recall past conversations, query your knowledge graph ("who did I meet at the conference?"), or get a summary of what you talked about with someone

How we built it

iOS (Swift/SwiftUI) - wraps the Meta Wearables DAT SDK to stream video frames from the glasses and capture audio via the iPhone mic. A WebSocket client sends frames and PCM audio chunks to the backend in real-time.

Backend (FastAPI/Python) - the core AI pipeline. - InsightFace for facial recognition and embedding comparison. - OpenAI Whisper for audio transcription, with an RMS-based gate to filter background noise before the API call - Claude for reminder extraction and structured knowledge graph entity/relationship extraction - Neo4j for the persistent knowledge graph

Frontend (Next.js) - a real time dashboard that visualizes the knowledge graph, with live updates pushed over websockets whenever a new conversation is processed.

Challenges we ran into

Meta Glasses SDK: Very new, very buggy, very limited

Whisper hallucinations: in noisy hackathon environments, Whisper could potentially hallucinate phrases. Solved with a RMS speech gate.

Accomplishments that we're proud of

End-to-end auto-enrollment: a stranger walks up, has a conversation, says their name, and is automatically added to the face database with no manual steps
A live knowledge graph that updates in real-time mid-conversation
The entire pipeline - glasses -> phone -> backend -> graph -> frontend, running under 300ms latency per frame

What we learned

The Meta Wearables DAT SDK is powerful, but has flaws. It has a lot of potential.
Designing for real-world noise is completely different from lab conditions, almost every threshold we set in testing had to be re-tuned at the venue
Streaming AI pipelines require careful back pressure management, audio buffers, frame throttling, and async processing all need to be coordinated or you get cascading latency

What's next for SAM

Calendar and email integration: draft up emails to people based on previous conversations, add tasks to calendar, upcoming meeting with recognized people
Linkedin Integration: Connect with people on Linkedin based on conversations.