Inspiration
We wanted to build something that felt genuinely wearable and fun, not just another web app demo. The idea came from a real social problem we all recognize: starting a conversation can be awkward even when you already know a bit about someone. We asked ourselves what it would look like if a device could quietly help you with a personalized icebreaker in real time. That question became Cupid Glasses.
What it does
Cupid Glasses generates personalized conversation starters based on who you are talking to and what the situation looks like. The glasses provide a live camera frame, the system matches that face to a saved roster profile, and then combines that person’s saved interests with inferred facial emotion. It generates up to five context aware icebreakers and displays them on the glasses. A touch sensor lets you instantly skip to the next suggestion.
How we built it
We built Cupid Glasses as a pipeline across mobile, backend, and hardware.
On mobile, I built a Flutter app that lets users create and edit a roster of profiles. Each profile stores a name, an image, and interests. The app writes profile data to Firebase Firestore and uploads images to Firebase Storage so the backend can retrieve structured context for generation.
On the backend, we built a Flask API that downloads roster images and metadata from Firebase. The ESP32 polls the backend, and when a request comes in, the backend captures a frame from the ESP32 camera endpoint. It uses DeepFace to match the captured face to the roster, retrieves the matched person’s interests, runs emotion inference, and then prompts Google Gemini using the image plus the context to generate five suggestions. The backend returns the five suggestions in the response to the ESP32.
We used Presage to run a live overlay on a user's emotion to enhance the gemini api call mixing the user's interest with their current emotion we are able to generate the optimal rizz line.
On hardware, we used an ESP32 camera to expose a camera endpoint over WiFi on the same network. The generated output is shown on a 16×2 LCD. Because the display is small, we implemented a text chunking and scrolling routine that breaks each suggestion into segments that fit the LCD and refreshes periodically so the full sentence can be read. The touch sensor cycles through the five options.
Challenges we ran into
Our biggest problems were hardware reliability and memory constraints.
We originally planned to use a different display, but the screen was faulty, so we pivoted mid build to a 16×2 LCD. That forced us to redesign the output experience and implement scrolling and paging so longer text would still be readable.
We also had issues getting the ESP32 camera streaming stable. Earlier misuse of components caused limitations, and we had to change how we handled memory and frame capture. We experimented with different approaches to storing and transmitting frames so the backend could actually receive usable images for face matching. The practical constraint was that we needed frames that were stable enough for recognition while still lightweight enough to transmit reliably.
We also faced the classic hackathon integration problem. Each subsystem worked on its own, but making it work end to end required debugging network calls, database structure, and timing across capture, inference, generation, and display.
Accomplishments that we're proud of
We are proud that we delivered a full end to end wearable demo under tight time constraints. The system was not just a concept. It connected profile creation, cloud storage, live capture, face matching, emotion context, LLM generation, and on device output.
We are also proud of the UX details that made the demo feel real, especially the LCD text formatting and the touch sensor control that let users cycle through multiple suggestions quickly.
What we learned
We learned that once hardware is involved, the hardest part is not the model call. It is reliability. Small failures like a broken display or unstable streaming can force major redesigns, so fallback plans and clean interfaces between components matter a lot.
We also learned to treat constraints as first class design inputs. The LCD capacity shaped how we formatted output. With a 16×2 LCD, the maximum visible characters at once is:
$$ \text{max_chars_per_screen} = 2 \times 16 = 32 $$
That small constraint ended up shaping a lot of the user experience.
What's next for Cupid Glasses
Next, we want to improve robustness and make it feel like a real product.
We want to make face matching faster and more reliable across lighting conditions, and we want to reduce latency from capture to display so the suggestions feel instant.
We also want to improve the on device UI beyond a basic LCD by using a higher resolution display once hardware is reliable, and we want better control over when suggestions are triggered so it is more intentional and less noisy.
Finally, we want to improve privacy by minimizing stored data, limiting retention of captured frames, and making the system work with the smallest amount of personal information needed to generate helpful suggestions.
Best Use of Gemini, Best Use of Presage
Built With
- 16x2-lcd
- c/c++-(esp32-cam-firmware)
- dart-(flutter)
- deepface
- esp32-cam
- firebase-(firestore
- fusion
- google-gemini-api
- opencv
- presage)
- python-(flask
- storage)
- touch-sensor
Log in or sign up for Devpost to join the conversation.