Inspiration
We unfortunately couldn't film a proper demo due to the wifi situation, but we were happy to make a pretty cool project! :D
For years, engineers and programmers have been plagued by a total lack of sauce. A swag shortage. A disco deficiency. So, we asked... what if we could use AI to literally teach you how to dougie? Introducing BaiLANDO: Your AI partner for mastering any groove. Ever get lost trying to follow complex choreography on YouTube, endlessly scrubbing through videos just to catch that one move, unsure if you're even close? For dancers learning online, the lack of immediate, personalized feedback makes mastering routines tedious and often frustrating. Now, imagine effortlessly transforming any online dance video into a personalized practice session, just by providing the URL and the specific timestamps you want to learn. BaiLANDO intelligently processes your chosen video segment, extracting the expert's movements via pose estimation to create a precise model for you to follow. As you dance, BaiLANDO uses your webcam to analyze your performance in real-time, scoring your pose accuracy, movement dynamics, and rhythmic timing against the expert. No more guesswork or endless repetition without direction. At the end of your session, BaiLANDO leverages AI like Claude to provide concise, actionable text feedback on your weakest areas, instantly converted to clear speech by Fish Audio, guiding your improvement. BaiLANDO is your dedicated AI dance coach, turning passive video watching into active, measurable progress, making learning any online choreography intuitive and effective.
What it does
BaiLANDO turns any YouTube video segment you choose into a personalized dance lesson. You give it a URL and start/end timestamps, and it processes the video, extracting the expert's moves. Using your webcam, BaiLANDO compares your dancing in real-time, scoring your pose shape, movement, and timing, all while you can watch the expert's moves side-by-side. When you finish, it uses Claude to generate intelligent text feedback on what to improve, then uses Fish Audio to speak that advice to you for a more personalized experience.
How we built it
We built BaiLANDO using a Python Flask backend and a JavaScript frontend. The backend handles user requests via a form, using yt-dlp and ffmpeg to download and cut YouTube video segments specified by the user. It then processes this segment using MediaPipe Pose (Python) to extract landmarks and runs a custom extract_keyframes.py script to identify key scoring moments based on change in pose position, saving both datasets. The frontend uses MediaPipe Pose (JavaScript) to track the user's webcam feed in real-time. During the dance, the frontend calls a backend /score_pose endpoint near keyframes; this endpoint loads the session's specific keyframe data and uses our custom PoseSimilarity class to calculate a hybrid score based on pose and movement, returning it to the frontend for display along with visual cues like pictograms. Upon completion, the frontend calls /get_ai_summary, which prompts Claude for text feedback based on aggregated scores and uses Fish Audio for text-to-speech, sending both back for the frontend to present to the user.
Challenges we ran into
Using MediaPipe for estimating the expert dancer's position wasn't too difficult, but building precise scoring between the player and expert was definitely tricky. Through trial and error, we settled on a mix of keyframe matching, velocity of certain joints, and other variables to make the most intuitive scoring user experience. Merging different branches together in GitHub also yielded difficulty, as each contained a variety of different features that clashed with one another.
Accomplishments that we're proud of
- A scoring system that's rewarding, doesn't punish irrelevant movements, and feels intuitive.
- Successful API calls to Claude and Fish to get personalized advice on how to improve dance
- Streamlined ability to use any YouTube video as a choreography trainer in < 1 minute
What we learned
Building BaiLANDO taught us the complexities of creating reliable video processing pipelines using tools like yt-dlp and ffmpeg. We learned that pose estimation accuracy from MediaPipe varies significantly and directly impacts scoring quality. Algorithmically defining "key" dance moments required substantial threshold tuning in our keyframe extraction logic. Designing an intuitive scoring system involved iterating between backend processing via Flask and real-time frontend analysis in JavaScript. Integrating external AI APIs like Claude and Fish Audio highlighted the importance of prompt engineering, handling different data formats like Base64, and managing asynchronous operations. We also encountered challenges with browser audio autoplay restrictions, requiring careful frontend implementation. Finally, managing dependencies and synchronizing code across different branches using Git proved essential for team collaboration under pressure. This project underscored how combining computer vision, backend processing, and AI APIs requires careful orchestration for a seamless user experience.
What's next for BaiLANDO
The possibilities are endless! For one, BaiLANDO currently relies on youtube choreography videos (of which there is no shortage) for making new 'expert dancers' to learn from, but it's possible for it to be trained off of any mp4 file that features an individual dancing in frame. Furthermore, developing a model that dances by itself based off of nothing but audio inference would be interesting!
Built With
- claude
- fish
- github
- javascript
- python

Log in or sign up for Devpost to join the conversation.