Inspiration
We noticed that traditional study tools are one-size-fits-all: you read text, maybe flip a flashcard, and hope your attention stays high. In reality, people’s focus ebbs and flows, and different learning modes work better at different moments. We wanted to build a “study companion” that watches your attention in real time and seamlessly adapts—summarizing content, then pivoting to quizzes, flashcards, mind-maps, audio narration, or even a quick break whenever you need it.
What it does
Upload & Summarize: You drop in a PDF or document; our backend calls Google’s Gemini API to generate a clean, Markdown-formatted summary.
Real-Time Focus Tracking: With your permission, the webcam streams to a Flask + Socket.IO server. We run MediaPipe FaceMesh on each frame to compute an eye-aspect ratio, normalize it into a 0–100 focus score, and send it back to the UI.
Adaptive Prompts: As you read the summary, your focus score continually feeds into a decision engine. When attention dips or you’ve been locked in too long, we spring into action with:
Flashcards for quick recall drills
Quizzes to test comprehension
Mind-Maps for visual concept mapping
Mini-Games (drag-and-drop challenges)
Audio Explanation via browser TTS
Break Reminders to refresh your mind
How we built it
Front end: Next.js + React, Tailwind CSS, Socket.IO-client, Web Speech API
Back end: Flask + Flask-SocketIO, MediaPipe FaceMesh, Google Gemini for summarization & content generation, Google Cloud Text-to-Speech
Infra & tools: Python 3.10, Git, dotenv
Challenges we ran into
Privacy-Preserving Data Handling Ensuring reliable attention tracking without sacrificing user comfort demanded careful design. We process all video data locally (no cloud uploads), use clear consent prompts, and build the interface so users always maintain full control over their information.
Cross-Platform Dependency Management Getting MediaPipe and OpenCV to run reliably on both Intel and Apple-Silicon Macs meant pinning Python to 3.10 and swapping in community-built wheels. We spent hours wrestling with pip markers, Eventlet patches, and conflicting shared libraries.
Robust Focus Detection Turn your webcam on at home and you’ll face wildly different lighting, angles, and face-sizes. Tuning the FaceMesh pipeline and EAR thresholds so it works on real users (not just ideal demo videos) required extensive testing and iterating.
Real-Time Integration & Event Flow Combining live frame capture (ImageCapture.grabFrame()), Socket.IO streams, Flask+Eventlet monkey-patching, and broadcast semantics was surprisingly tricky. Early builds either dropped frames, hung the server, or only sent updates to one component.
Adaptive Content Logic Crafting the right “when and how” to switch from reading to flashcards, quizzes, mind-maps or breaks took dozens of iterations. If we prompted too early it felt naggy; too late and users glazed over.
Accomplishments that we're proud of
Reliable Focus Detection Built a focus-detection pipeline using MediaPipe FaceMesh and eye-aspect-ratio analysis that works across varied lighting and camera positions.
Low-Latency Real-Time Integration Engineered a WebSocket system with Flask-SocketIO so focus scores drive immediate UI updates and learning prompts.
Modular Interactive Modules Added multiple learning modes—flashcards, quizzes, mind-maps—through a clean, plug-and-play backend architecture.
On-Demand Audio Summaries Integrated browser-native text-to-speech for audio explanations without any extra server dependencies.
Intuitive User Interface Crafted a cohesive Next.js/Tailwind card-based UI featuring live attention charts and context-aware modals.
Seamless Cross-Platform Setup Packaged the entire app for effortless launch on macOS, Linux, and Windows via a Python 3.10 venv (and optional Docker).
What we learned
Dependency management matters: Complex native libraries like MediaPipe demand exact interpreter versions and wheel variants on different OSes.
Real-time event-driven design: Socket.IO with Eventlet works beautifully once the monkey-patch order and broadcast semantics are correct.
Prompt finesse is crucial: AI outputs must be guided to cleanly produce Markdown, JSON, or plain text without extraneous formatting.
Browser APIs can simplify: Leveraging the Web Speech API for TTS eliminated the need for a backend audio route and reduced latency.
Adaptive UX design: Users respond best when interventions are timely, lightweight, and match their momentary attention state.
What's next for StudyBuddy.AI
Advanced Analytics Dashboard: Long-term focus trends, topic-level performance breakdowns, and spaced-repetition scheduling made visible.
Collaborative Study Rooms: Real-time co-study sessions where peers can share focus metrics and co-create quizzes or mind-maps.
Mobile & Offline Capabilities: A React Native or PWA version with offline summaries and cached AI prompts for on-the-go learning.
Enhanced Attention Signals: Add posture/emotion detection via additional computer-vision models for richer engagement cues.
Personalized Learning Paths: Machine-learning recommendations for next-best activities, based on each user’s historical focus and performance.
Log in or sign up for Devpost to join the conversation.