Inspiration

We have all been there. You open your laptop to study and somehow end up scrolling through your phone for an hour. Existing focus apps use honor systems or simple timers, but they have no idea if you are actually learning. We wanted to build something that creates real consequences for distraction. Not just a timer you can ignore, but a system that knows when you are faking it and holds you accountable to your friends.

Our design is informed by research on gamified learning mechanics (Xu, Read & Allen, 2023), which demonstrated that embodying learner progression through game elements, where performance directly powers in-game characters, significantly improved engagement and recall. Their PEG framework shows engagement scales with feedback speed and visible progression. Buddy applies this at every layer: focus state instantly reflects in pet behavior, quiz performance drives real-time scores, and Solana betting adds real stakes on top. We extend their findings from a single-player context to a multiplayer, multi-subject environment with AI-powered personalized assessment.

The idea clicked when we combined three concepts: social pressure (your distraction hurts your teammates), intelligent tracking (AI that catches you even when you are facing the screen but browsing Twitter), and real financial stakes (SOL on the line). Wrapping all of that in a game where your pet companion reacts to your behavior felt like something that could actually change how people study.

What it does

Buddy, Lock In! is a multiplayer co-working platform where up to 4 players join a synchronized 3D study room. Each player has a custom 3D pet companion that reacts to their behavior in real time.

A dual-layer AI system tracks your focus:

  1. MediaPipe Face Landmarker runs entirely in the browser and monitors whether you are physically looking at your screen. If your head rotates from center or you leave the frame, you are marked as distracted.
  2. Gemini 2.5 flash captures your screen every 45 seconds and classifies whether you are actually studying or browsing something off topic.

If MediaPipe says you are looking at the screen but Gemini sees Twitter, you are caught. The system fires a "fake focus" event, your pet reacts, and your teammates know.

Surprise quizzes pop up every few minutes, generated from your uploaded study materials and from concepts Gemini observed on your screen. Your pet reads each question aloud using ElevenLabs. All players answer simultaneously.

In "Locked In" mode, players stake SOL through Phantom wallet. The most focused player takes the pot. The final score is calculated as:

$$\text{score} = 0.50 \times \text{focus_percent} + 0.20 \times \text{quiz_accuracy} + 0.15 \times \text{response_time_score} + 0.15 \times \text{consistency_score}$$

At session end, Gemini generates a personalized study report with distraction patterns, concept breakdowns, and review recommendations. Everything persists across sessions. Your pet levels up, your stats build, and the leaderboard tracks your history.

How we built it

Frontend: React with Vite. The landing page uses a custom canvas renderer for pixel art Pokemon sprites, twinkling stars, and animated backgrounds, all styled with the Press Start 2P font for a retro aesthetic. The 3D study room is built with React Three Fiber, rendering custom Blender modeled .glb pet models with animation states driven by real time socket events.

Real time sync: Socket.io connects the Node.js/Express backend to up to 4 simultaneous clients. Focus states, quiz events, buddy selections, room settings, and session lifecycle all flow through socket events.

Focus detection: MediaPipe Face Landmarker runs client side for head pose estimation. Only a boolean (focused: true/false) is sent over the network. No webcam video ever leaves the browser. Gemini Vision handles the second layer, analyzing screen captures server side and discarding the raw images immediately after processing.

Quiz generation: Gemini generates multiple choice quizzes from uploaded PDFs at session start, and comprehension quizzes from extracted screen concepts at session end.

Voice: ElevenLabs gives each pet species a distinct voice personality. Voice lines fire on state changes: disappointment when when you drift, and express excitement during quizzes.

Blockchain: Solana handles the betting layer. Players send SOL to a server managed escrow wallet via Phantom. At session end, the server calculates the winner and transfers the pot. All on devnet for the demo.

Database: MongoDB Atlas stores user accounts, pet progression, session history, and leaderboard data.

3D assets: All pet models were imported and adjusted on Blender by our teammate, animated with multiple animation clips, then exported as .glb files.

Challenges we ran into

WebSocket synchronization was our most persistent challenge. Room options, player ready states, buddy selections, and session settings all had to stay perfectly in sync across multiple clients. One missed event or stale state and the whole flow broke. We spent significant time debugging race conditions where one player's view did not match another's, especially around the ready/start session flow and settings propagation.

The Gemini Vision screen analysis pipeline required careful orchestration. Capturing screenshots via getDisplayMedia(), encoding them, sending them to the server, processing them through Gemini, and routing the structured results back to the correct client all within a reasonable timeframe was tricky. We had to balance capture frequency (every 45 seconds) against API rate limits and response latency.

Getting Blender models into React Three Fiber took more iteration than expected. Scale mismatches meant our Pokemon were either microscopic or filling the entire room. Animation clip names did not always match what the code expected. Positioning models precisely on the table surface required a lot of manual tweaking, which we eventually solved by adding a live debug GUI with sliders using leva.

Accomplishments that we're proud of

Getting the full end to end flow working is what we are most proud of. A user can create an account, make a room, invite friends, upload study material, pick a pet, stake SOL, enter a 3D study room, get tracked by dual layer AI, answer surprise quizzes read aloud by their pet, and receive a personalized study report. All in one seamless experience built in under 24 hours by a team of three.

Every piece of the stack talks to every other piece: the frontend renders 3D models animated by socket events driven by AI analysis processed on the backend and persisted in a database. The number of technologies we integrated (React Three Fiber, Socket.io, MediaPipe, Gemini, ElevenLabs, Solana, MongoDB, Blender) and the fact that they all work together without falling apart feels like a genuine achievement for a hackathon.

What we learned

This project pushed all three of us well outside our comfort zones.

None of us had used React Three Fiber before. Loading .glb models, managing animation states, and synchronizing 3D scenes with a real time backend was a crash course in creative coding.

Socket.io taught us how deceptively complex real time sync becomes once you have multiple event types, multiple clients, and state that needs to stay consistent across all of them.

Integrating Solana was our first exposure to web3 development. Wiring up Phantom wallet, handling devnet transactions, and building an escrow flow from scratch in under 24 hours was intense but deeply educational.

MediaPipe showed us that serious ML can run entirely in the browser without a GPU server. The Face Landmarker model loads via WASM and processes frames locally, which was eye opening for what is possible on the client side.

The Gemini API impressed us with how reliable structured JSON output can be when the system prompt is well defined. Our quiz generation rarely produced malformed responses.

Our Blender teammate went from basic modeling to rigged, animated, exported .glb files in a single day.

Above all, we learned how to scope aggressively, build in parallel, and trust each other under real time pressure.

What's next for Buddy! Lock In!

The foundation is built for something that could genuinely help students. Our roadmap includes:

Trustless on chain escrow using an Anchor program so the betting layer does not rely on a server wallet. Full decentralization of the competitive study protocol.

SPL token reputation that encodes your study history on your wallet. Before entering a bet, you could verify an opponent's track record the same way you would check a poker player's public hand history. A database can be faked, but a wallet history is verifiable and portable.

Expanded screen analysis with real time concept mapping, building a knowledge graph of what you studied and surfacing gaps in your understanding.

More pet species and cosmetics unlockable through focus streaks and quiz performance, giving long term progression real depth.

Study group features for classes and clubs, with integration into learning management systems so quizzes can pull directly from course material.

The long term vision is a competitive study protocol where accountability is social, stakes are real, and your effort is permanently on the record.

Built With

Share this project:

Updates