Skip to content

Mikitoxo/RukAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 

Repository files navigation

RukAI - The Multimodal AI Jump Rope Coach

RukAI is an AI Agent that utilizes multimodal inputs and outputs to help users track their Skipping rope workouts It is a next-generation, real-time fitness agent that watches, analyzes, and coaches your jump rope form using continuous video and audio. Built for the Gemini Live Agents hackathon track, RukAI moves beyond text-in/text-out interactions by acting as a true physical-world companion.

TRY: https://jumprope-coach.web.app/

The Vision

Traditional fitness apps just count reps. RukAI is built to actually see you. By leveraging device vision and the Gemini Live API, RukAI tracks your biomechanics in real-time. If your form breaks down, the AI instantly interrupts your workout with live, spoken audio corrections to help you fix your technique before you injure yourself.

System Architecture

Here is the data flow for the RukAI application:

graph TD
    subgraph Client [Client-Side: React.js Web App]
        UI[React UI Dashboard & State]
        Cam[Live Webcam Feed]
        MP[MediaPipe Pose Landmarker]
        Audio[Web Audio API]
        WS_C[WebSocket Client]
    end

    subgraph Firebase [Firebase / Google Cloud]
        Auth[Google OAuth 2.0]
        DB[(Firestore NoSQL DB)]
        Hosting[Firebase Hosting]
    end

    subgraph Backend [Backend Server: Node.js]
        WS_S[WebSocket Server]
        ADK[Google ADK]
        GeminiClient[Gemini API Controller]
    end

    subgraph External [Google AI Services]
        Gemini[Gemini Live API]
    end

    Cam -->|30fps Video Frames| MP
    MP -->|Skip Count updates| UI
    MP -->|Form Correction JSON Flags| WS_C
    UI <-->|Read/Write User Stats| DB
    UI <-->|Authenticate| Auth
    Hosting -->|Serves App To| Client
    WS_C <-->|Bi-directional connection| WS_S
    WS_S -->|Passes Form Warnings| GeminiClient
    GeminiClient <-->|Prompts & Audio Streams| Gemini
    GeminiClient -->|Raw Audio Byte Arrays| WS_S
    WS_S -->|Pipes Audio| WS_C
    WS_C -->|Plays Coaching Voice| Audio
    WS_S -->|Triggers Post-Workout Summary| ADK
    ADK -->|Saves Summary| DB
Loading

Hackathon Track Alignment: Live Agents

  • Real-time Interaction (Vision/Audio): Processes 30fps webcam feed to analyze physical movement.
  • Interruptible Audio Coaching: Pushes real-time heuristic form flags via WebSocket to the Node.js backend, triggering Gemini Live to speak form corrections out loud.
  • Mandatory Tech: Powered by the Gemini Live API for real-time conversational audio and hosted entirely on Google Cloud / Firebase. Generates post-workout analytics using Google's ADK (Agent Development Kit).

Key Features

  • Live Audio Feedback Loop: Streams generated coaching audio to the browser via WebSockets for real-time interventions.
  • Developer Testing Mode (Current Vision Engine): The current computer vision engine utilizes lightweight MediaPipe heuristics designed specifically with a "permissive testing mode." This allows judges to easily trigger form corrections (like elbow flares or head-bobbing) and experience the Gemini Live audio loop while sitting at a desk, without needing to perform a rigorous jump rope routine.
  • Dynamic Training Calendar: A dot-matrix activity heatmap that automatically cross-references your workout history against custom-generated Firebase regimens (Beginner, Intermediate, Advanced).
  • Smart Streak Tracking: Calculates daily consistency utilizing SVG progress rings.

System Architecture

  1. Frontend (React.js): A split-screen dashboard featuring a cinematic live-camera feed and a sleek metrics panel using Google Sans typography and custom SVG icons.
  2. Computer Vision Engine (MediaPipe): Runs entirely client-side for zero-latency analysis, piping telemetry data directly to the backend.
  3. Backend Engine (Node.js & WebSockets): Maintains a persistent socket connection to stream raw byte-array audio from the Gemini Live API to the browser's AudioContext.
  4. Database (Firestore): A NoSQL architecture managing users, workouts, and seeded regimens.

Tech Stack

  • Frontend: React.js, HTML5 Canvas, Web Audio API
  • Backend: Node.js, WebSockets (ws)
  • AI & Vision: Gemini Live API, ADK, Google MediaPipe Tasks Vision
  • Cloud & DB: Firebase Hosting, Firestore Database, Google Cloud Platform (OAuth 2.0)

Running the Project Locally

Prerequisites

  • Node.js (v16+)
  • A Firebase Project with Firestore enabled
  • A Google Cloud Project with OAuth 2.0 Client IDs configured

Setup Instructions

  1. Clone the repository:
    git clone [https://github.com/mikitoxo/RukAI.git](https://github.com/mikitoxo/RukAI.git)
    cd RukAI
    
  2. Install Frontend Dependencies:
    cd frontend
    npm install
    
  3. Install Backend Dependencies:
    cd backend
    npm install
    
  4. Seed the Database: Ensure your Firebase serviceAccountKey.json is in the backend folder, then run:
    node seedRegimens.js
    
  5. Run the App:
    • a.) Start the backend socket server:
      node server.js
    • b.) Start the React frontend:
      npm start

Future Scope

Dedicated Computer Vision AI: Replacing the current heuristic testing engine with a custom-trained, lightweight ML model specifically optimized for jump rope biomechanics and complex maneuvers (like crossovers and double-unders).

Native Mobile Port: Wrapping the React application in React Native for offline-first iOS/Android support.

About

RukAI is an AI Agent that utilizes multimodal inputs and outputs to help users track their Skipping rope workouts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors