Inspiration
As new UC Berkeley freshmen, we quickly realized how important strong communication, presentation, and physical performance skills are—whether for interviews, coffee chats, or group projects. But we also noticed something missing: there’s no easy way to practice these skills and get personalized, expert-level feedback before the big moment. This problem isn’t limited to us or even to Berkeley; anyone learning a physical skill, preparing for a talk, or refining form could benefit from adaptive coaching. We wanted to build something that feels like having a personal team of instructors available on demand, for any skill—not just the ones AI models were trained on. That’s where the idea of leveraging agentic AI to create a universal skill coach was born.
What it does
AI Skill Coach evaluates any visually or audibly verifiable skill—public speaking, pushup form, dribbling a basketball, interview delivery, you name it. The user enters a skill and records a short video. Our system then pulls expert technique standards from the web, analyzes the user’s performance using specialized vision, audio, and motion agents, and delivers instructor-like voice feedback with specific, actionable corrections. It behaves like a personalized coaching team watching you perform, scoring your form, identifying mistakes, and telling you exactly what to fix and how to improve.
How we built it
We built AI Skill Coach using a modular agentic architecture. LightPanda serves as our headless browser for live web scraping, collecting expert guides, biomechanical cues, coaching checklists, and common mistakes for any skill the user inputs. Redis stores this information for fast, low-latency retrieval. Anthropic’s Claude synthesizes the scraped data into structured evaluation rubrics and also performs multimodal analysis of the user’s video and audio. Our specialized agents—vision, audio, motion, scoring, and voice—each handle one aspect of human performance and collaborate to produce a unified evaluation. The frontend (built with a lightweight web framework) allows users to record or upload videos, while a voice generation layer turns technical analysis into natural, human-like coaching feedback.
Challenges we ran into
The biggest challenge was generalizing across unlimited skills. Every skill has different cues, standards, and common mistakes, so we needed a dynamic pipeline that could extract and normalize technique criteria from messy web data on the fly. Structuring that data into consistent scoring rubrics was equally difficult. Matching video and audio signals to these diverse rubrics required flexible multimodal agents and careful coordination. Latency was another bottleneck—we had to optimize scraping, caching, and evaluation so users could get feedback quickly. Finally, building a simple UI for such a complex system required thoughtful design to keep the experience intuitive.
Accomplishments that we're proud of
We’re proud that we built a working end-to-end prototype capable of evaluating completely different skills with no domain-specific tuning. Our dynamic pipeline—LightPanda, Redis, Claude, and the multi-agent architecture—scales to new skills automatically. We created a clean UI that lets users record a video and receive personalized voice feedback within seconds. Most importantly, we proved that agentic AI can behave like a real team of instructors, delivering nuanced, human-like coaching instead of generic model responses.
What we learned
We learned that AI is far more powerful when paired with real-time sensory input and live knowledge extraction, rather than relying only on pre-trained data. Generalization requires strict modularity and clean interfaces between agents. We also learned the importance of structuring scraped knowledge into rubrics before analysis, since raw web content is inconsistent. And above all, we learned that users want coaching that feels personal, spoken, and directly actionable—not just text or scores.
What's next for AI Skill Coach
Next, we aim to move from post-video evaluation to true real-time coaching, where users get corrections mid-motion or mid-speech. We plan to integrate 3D pose estimation for more accurate form tracking, add progress dashboards and personalized training plans, and introduce community-driven skill packs so niche skills can be shared and improved collaboratively. We also want offline cached skill models for instant evaluation without re-scraping. Ultimately, our goal is to evolve this project into a universal, always-available AI coach for any visual or auditory skill you want to master.
Built With
- claude
- lightpanda
- python
- redisvl
Log in or sign up for Devpost to join the conversation.