FaceLink

Inspiration

We've all been there — you see someone across the room at a conference, you know you've met them, but their name is gone. You fumble through small talk hoping context clues surface. Or worse: you pitch someone you already pitched, forget a detail they shared that mattered to them, or miss a follow-up that could have changed your career. Human memory isn't built for high-volume, high-stakes networking — but relationships are built on exactly that: remembering. FaceLink started from a simple belief — forgetting shouldn't cost you a connection.

What it does

FaceLink is a voice and vision AI companion for professional networking. Point your phone camera at someone, and FaceLink identifies them, surfaces everything you know about them — past conversations, shared topics, follow-ups you promised — and whispers it in your ear through a natural voice. When you meet someone new, FaceLink remembers them for you. When you reconnect, it briefs you before you even say hello. Every person becomes a profile. Every conversation becomes a memory. Every interaction makes FaceLink smarter about who matters to you and why.

How we built it

FaceLink runs on a FastAPI backend connected to a React + Vite TypeScript PWA via WebSockets — a persistent, bidirectional connection that streams camera frames every 2 seconds and audio continuously from the phone. On the backend, each frame hits AWS Rekognition for face matching against an indexed collection, then CLIP (ViT-L/14) encodes a cropped face embedding for our self-learning layer. A Gemini 2.5 Flash agent routes intent — IDENTIFY, REMEMBER, RECALL, OBSERVE, CHITCHAT — dispatches the right tools, builds context from mem0 + Pinecone memory, and generates a response. ElevenLabs converts that to voice and streams it back. Every trace — face recognition, memory retrieval, reasoning, TTS — is instrumented through Datadog APM so the entire cognitive pipeline is observable in real time.

Challenges we ran into

WebSocket audio streaming from a phone browser is deceptively hard — browser mic APIs require HTTPS, MediaRecorder output formats vary across iOS Safari and Chrome Android, and keeping audio in sync with video frames without a dedicated media server required careful buffering design. Coordinating the face pipeline latency (Rekognition + CLIP + mem0 retrieval) to stay under 500ms per frame while running continuous audio was our biggest performance challenge. We also had to design the self-learning loops carefully so they improve on real signal — not just emit metrics — which meant building evaluation logic into the agent itself rather than bolting it on.

Accomplishments that we're proud of

Three live self-learning metrics all trending upward in Datadog during the demo — face confidence, memory retrieval quality, and intent routing accuracy — each one genuinely improving on real interactions, not synthetic data. The moment when FaceLink says "That's Alex from Datadog — last time you talked about APM best practices and you said you'd introduce her to your team" after a face hasn't been seen for 20 minutes is the moment that lands. We shipped a full production-grade observability pipeline — Datadog service map with people as nodes, traces as conversations, spans as utterances — that makes the AI cognition genuinely debuggable.

What we learned

Memory architecture is the hardest part of agentic AI — not the model, not the voice, not the vision. Getting retrieval to feel natural (surfacing the right memory at the right moment without being creepy or overwhelming) required more iteration than every other component combined. We also learned that Datadog APM is genuinely the right mental model for multi-service AI agents: if you think of each person as a service and each conversation as a distributed trace, the observability patterns map perfectly. That reframe unlocked a lot of the architecture.

What's next for FaceLink

Relationship graph: Map connections between people — "Alex knows Jamie, who works on the team you're targeting" — turning contacts into a navigable network
Pre-meeting briefs: The night before a scheduled meeting, FaceLink sends a voice summary of everything relevant about who you're seeing
Ambient mode: Run continuously in the background during events, building context passively without requiring any explicit interaction
Team memory: Share a context layer across a team — so when your colleague meets someone your CEO already knows, that relationship history is available
Privacy-first architecture: On-device face embeddings with zero cloud photo storage, giving users full control over their own relationship data