Learning Buddy
Inspiration
We noticed a persistent gap between digital study tools and the physical classroom environment. While many AI tutors exist, they often require manual, post-lecture data entry. We wanted to build a "Living" study assistant—a hardware-software ecosystem that sits on a student's desk, listens to lectures in real-time, and immediately transforms that audio into a searchable, interactive knowledge base.
What it does
Learning Buddy is a full-stack AI platform that merges a high-performance web dashboard with a physical ESP32-S3 recording device.
- Real-Time Capture: Stream live audio via WebSockets directly from the hardware to the backend.
- Contextual RAG: Whether through PDF/DOCX uploads or live recordings, the system chunks and embeds data for grounded AI responses powered by Google Gemini.
- Voice-First Interaction: Integrated ElevenLabs Conversational AI allows for low-latency, hands-free dialogue with study materials.
- Device Management: A seamless pairing system using 6-character keys to link physical hardware to web accounts.
- Gamified Productivity: Includes DeskPet, an interactive digital companion that reacts to learning activity and milestones.
How we built it
The project was engineered with a focus on modern reactivity and performance:
- Frontend: Developed using SvelteKit 2 and Svelte 5 (Runes), styled with Tailwind CSS v4 for a streamlined UI.
- Backend: A Python Flask server utilizing Flask-SocketIO for real-time PCM audio processing and Flask-JWT-Extended for secure authentication.
- Hardware: ESP32-S3 Sense firmware (via PlatformIO) utilizing a PDM microphone to stream audio data.
- The AI Pipeline: We used
faster-whisperfor efficient transcription, Google Gemini for intelligence and embeddings, and ElevenLabs for the low-latency voice interface. - Deployment: The frontend builds to static files served directly by the Flask backend, all containerized via Docker.
Challenges we ran into
The primary hurdle was the Real-time Audio Pipeline. Handling raw PCM chunks over WebSockets from an ESP32 and assembling them into a valid WAV format on the backend required precise buffer management to prevent data loss. Additionally, adopting
Accomplishments that we're proud of
- Successfully implementing a reliable WebSocket streaming pipeline from an embedded device to a cloud-based transcription engine.
- Achieving a low-latency voice-to-voice experience that stays grounded in the user's specific source materials.
- Building a robust, single-server deployment strategy that handles both the high-frequency SocketIO traffic and static frontend delivery.
What we learned
We deepened our collective understanding of Vector Embeddings and the nuances of chunking strategies for varied document types. We also gained significant experience in embedded systems, specifically regarding memory management and maintaining network stability during continuous audio streaming on the ESP32.
What's next for Learning Buddy
- Multi-Device Sync: Allowing several "Buddies" to contribute to a single shared knowledge base.
- Edge Processing: Moving transcription or smaller LLM tasks to the edge to increase privacy and reduce latency.
- Proactive DeskPet: Evolving the digital companion to provide proactive study reminders and insights based on the user's recorded lecture history.
Tech Stack
| Layer | Technology |
|---|---|
| Frontend | Svelte 5, SvelteKit 2, Tailwind CSS v4, TypeScript |
| Backend | Python Flask, Flask-SocketIO, MongoDB, Rust |
| Hardware | ESP32-S3 Sense (C++/Arduino), PDM Microphone |
| AI/ML | Google Gemini (LLM/RAG), ElevenLabs (Voice), faster-whisper |

Log in or sign up for Devpost to join the conversation.