AI piano practice coach for turning one performance video into score alignment, posture checks, and actionable feedback.
I wanted to build something closer to a real piano teacher than a generic score checker.
Most practice tools only tell you whether a note was right or wrong. I wanted a system that could also answer:
- Did I play the right notes, at the right time?
- Was my tempo stable?
- Did my posture look healthy while I was playing?
- What should I fix next, in plain language?
That idea became Doceo: a performance tutor that combines audio analysis, video analysis, and an AI-generated coaching layer.
Doceo takes in:
- a reference MIDI file
- a performance video
Then it:
- extracts audio from the video
- transcribes the performance into note events
- aligns the played notes against the reference score
- analyzes posture from sampled video frames
- generates feedback and drills
- renders the result in a web interface
Output:
- score annotations
- piano-roll fallback
- video playback with pose overlay
- reference audio playback
- AI tutor feedback
- upload reference MIDI
- upload performance video
- run analysis
- view results
The results page shows:
- the annotated score
- a piano-roll fallback
- A/B playback with the user video and synthesized reference audio
- posture overlay on top of the video
- focus areas and drills
- AI tutor feedback
The backend lives in api/ and is built with FastAPI.
Pipeline:
/midiparses the reference MIDI, exports MusicXML, and synthesizes a reference audio track/videostores the performance video and extracts a mono WAV file withffmpeg/analyzetranscribes the extracted audio with Basic Pitch/alignmatches the played notes to the reference score with DTW/posesamples video frames with OpenCV + MediaPipe Pose/Hands/tutorfeeds the analysis into Gemini and ElevenLabs for spoken coaching
FastAPIfor the API layerNext.jsandReactfor the frontendffmpegfor video-to-audio extractionbasic-pitchfor audio-to-note transcriptionpretty_midiandmusic21for MIDI/score handlingfastdtwfor note alignmentOpenCVfor frame samplingMediaPipefor pose and hand landmark detectionGeminifor tutor script generationElevenLabsfor voice output
Next.js 16React 19TypeScriptOpenSheetMusicDisplay
FastAPIUvicornPythonpretty_midimusic21basic-pitchfastdtwOpenCVMediaPipeffmpeg
Geminifor written tutor feedbackElevenLabsfor audio narration
This was the hardest part of the project.
The raw audio coming from a performance video is messy:
- room noise leaks into the signal
- pedal resonance blurs note boundaries
- timing is not perfectly quantized
- transcribers can miss short notes or create duplicates
- performance tempo can drift compared with the reference
To make the transcription usable, I had to add multiple cleanup layers:
- extract a clean mono track from the video
- run Basic Pitch to get candidate note events
- smooth and merge near-duplicate transcribed notes
- estimate timing offset against synthesized reference audio
- reference-guide the note cleanup using the score itself
- write both raw and cleaned MIDI outputs for debugging
Even after transcription, note alignment still needed a second pass. I used DTW to compare the played note stream against the reference, then labeled each event as correct, wrong pitch, missed, extra, early, late, or on-time.
That combination of transcription plus alignment is what makes the output feel coherent instead of noisy.
- Audio transcription needs cleanup and alignment to be useful.
- Reference-aware heuristics matter when performances drift from the score.
- Audio and video analysis solve different parts of the problem.
- Specific feedback beats a raw score.
- Structured inputs make AI feedback much better.
- improve transcription accuracy on noisier recordings
- support longer performances and bigger excerpts
- separate left-hand and right-hand analysis
- improve posture scoring
- support multi-song sessions
- track progress across practice attempts
- deploy the pipeline for reliable scale
- personalize feedback from recurring mistakes
cd api
pip install -r requirements.txt
uvicorn main:app --reload --port 8000or
cd /Users/dzkchen/doceo/api
source .venv/bin/activate
python -m uvicorn main:app --host 127.0.0.1 --port 8000cd web
npm install
npm run devOpen the app and upload a reference MIDI plus a performance video.
api/- FastAPI backend and analysis pipelineweb/- Next.js frontendapi/storage/- session outputs, transcriptions, alignments, and posture results
No license has been specified yet.