Inspiration

I was learning guitar and couldn’t see what my fretting hand was doing. My wrist was off, my thumb crept over the neck, and I kept muting the high E. I wanted a coach that could watch and call out the mistake. Nothing did.

LLMs can teach theory, but they don’t see technique. That’s the trap with physical skills: you can’t catch your own form errors. Coaching is expensive, and video without expert eyes doesn’t help much. So we built Morphi to give biomechanical feedback to anyone.

What it does

Record five seconds of your swing, serve, or routine. Morphi finds the breakdown, pauses on it, overlays your skeleton, and draws the angles to fix. A voice coach explains what happened. You get a form score, track progress over time, and you can pick from different coach personalities (strict, supportive, concise) to match your learning style.

Flowchart Link

mermaid link

How we built it

The React Native app handles recording and playback. We run YOLO26x-Pose on Modal’s serverless H200 GPUs to extract keypoints, then feed frames into Motion Coach (our fine-tuned Qwen3-VL-32B). Motion Coach returns structured feedback with timestamps and correction vectors. ElevenLabs generates the voice. The frontend renders overlays with react-native-svg.

Training used 2,300 sports and yoga images from Kaggle, filtered to single-person frames. We wanted to use Gemini 3.1 Pro for labeling, but hit rate limits, so we used Gemini Flash and distilled labels from that. We trained a LoRA adapter (rank 256) for 10 epochs on 8× H200s, then merged it into the base weights at build time so vLLM serves Motion Coach as one fused model.

LoRA: https://huggingface.co/Playbird12/motioncoach-qwen3vl-32b-lora

Challenges

We started on B200s and lost time to driver issues, so we switched to H200s and moved on. Mobile sync across video, audio, and SVG overlays was finicky. Motion Coach sometimes returned shaky coordinates, so we snapped corrections to real pose keypoints. We also replaced AI-generated scores with a deterministic formula based on error severity.

Accomplishments

We trained the LoRA end-to-end in about two hours, from data download to a deployed model. The full pipeline runs in under 60 seconds: pose extraction, Motion Coach inference, voice generation, and overlay rendering.

What we learned

Distillation was the unlock. Motion Coach got an intelligence boost and a speed boost. Pro thinking models can match or beat the results, but they often need more time to deliberate, and that latency does not fit a tight coaching loop.

More broadly, keep the model on language and use rules for geometry and scoring. vLLM beat raw HuggingFace for speed and simplicity. Serverless GPUs are ideal for hackathons.

What’s next

Real-time AR overlays without recording delay. Side-by-side comparison with pro reference videos. Multi-person tracking for team sports. A layer for coaches to share annotated sessions.

Built With

Share this project:

Updates