Hero is an AI-powered exercise form analysis system that evaluates bodyweight and barbell exercises (Push-up, Squat, Deadlift, etc.), produces rep-level feedback and scores, and offers short, actionable coaching corrections — including synthesized voice feedback.
backend/— FastAPI server, Celery worker, MediaPipe/ffmpeg processing, Gemini LLM orchestration, and ElevenLabs TTS integration.src/— Frontend (Vite + React + TypeScript) with pages for recording/uploading, analysis progress, and results overlays.public/— Static assets.package.json&bun.lockb— Frontend build and scripts.
- Two-pass LLM analysis (compact angle time-series + multimodal pass with key frames)
- Median-filtered angle smoothing and canonical joint-name normalization
- Client-side downscaling to 480p before upload to save bandwidth
- ElevenLabs TTS synthesis of
actionable_correctionstored in Redis and played in the Results UI
Requirements
- Node.js (recommended 18+)
- Python 3.10+ (backend)
- Redis (for Celery broker/result store)
- ffmpeg (in PATH) — required by backend for video processing
Copy .env.example to .env and fill in values. Important variables:
GEMINI_API_KEY— Google Gemini API keyGEMINI_MODEL— (optional) model to use, e.g.gemini-1.5-proREDIS_URL— Redis connection stringELEVENLABS_API_KEY— ElevenLabs API key (for TTS)ELEVENLABS_VOICE_ID— ElevenLabs voice identifier to use for synthesis
Start the frontend (development)
# from repository root
npm install
npm run devStart the backend (development)
# create and activate Python venv
python -m venv .venv
.\\.venv\\Scripts\\activate
pip install -r backend/requirements.txt
# start FastAPI server
uvicorn backend.main:app --reload
# start Celery worker in a separate terminal
celery -A backend.worker worker --loglevel=infoNotes: The worker expects Redis to be running and the .env to contain the keys listed above. Video uploads will be processed, analyzed by the two-pass Gemini pipeline, and if a correction is generated it will be synthesized to audio and stored in Redis under correction_audio:{task_id}.
- Frontend captures or uploads a short exercise video and posts it to the backend
/analyzeendpoint. - Backend normalizes frames with MediaPipe, computes joint angles, applies median smoothing, and buffers landmark frames to Redis.
- Pass 1 (Gemini): text-only analysis on compact angle time-series to detect repetitions and boundaries.
- Pass 2 (Gemini multimodal): classification, scoring, and an
actionable_correctiontext field, optionally with named joints. - The backend synthesizes
actionable_correctionvia ElevenLabs and stores the audio in Redis; the frontend fetches and auto-plays the audio on the Results page.
backend/worker.py— main Celery processing pipeline, Gemini calls, and ElevenLabs TTS helperbackend/main.py— FastAPI endpoints (including/correction-audio/{task_id})src/pages/Demo.tsx— client-side 480p downscaling and upload flowsrc/pages/Results.tsx— polling results, fetching & auto-playing correction audiosrc/lib/api.ts— frontend API helpers (includesfetchCorrectionAudio)
- Frontend unit tests use
vitest(seevitest.config.ts) — run with:
npm run test- Do not commit secrets to the repository. Use environment/secret managers in production.
- Rate-limit / authenticate public endpoints; the LLM and ElevenLabs APIs are billable.
- Add a visible "Play correction" button in the Results UI (currently audio auto-plays).
- Add fallback handling when TTS fails: surface textual correction and a retry button.
- Add CI steps to lint and run tests, and a minimal Docker Compose for local integration testing (Redis + backend + frontend).