Frontend Repository: lastvigil-front
A real-time ASL gesture recognition game backend using FastAPI WebSocket, MediaPipe, and machine learning models. The system processes webcam input to detect ASL alphabet gestures and gaze direction for controlling a 2D defense game.
- Real-time ASL Gesture Recognition: Uses MediaPipe hand landmarks and scikit-learn ML models to recognize ASL alphabet gestures
- Gaze Tracking: Calculates gaze direction from face landmarks for targeting in the game
- Session-based Game Logic: Independent game sessions with enemy spawning, collision detection, and wave progression
- WebSocket Communication: Real-time bidirectional communication between client and server
- Feature Extraction: Normalizes MediaPipe landmarks to position/size-invariant 40D feature vectors
- Model Training: Supports training KNN, SVM, and Random Forest models on gesture data
- Clone the repository:
git clone https://github.com/aaronshin43/lastvigil-back.git
cd lastvigil-back- Install dependencies:
pip install -r requirements.txt- Download or train the ASL gesture recognition model:
- Place the trained model file as
models/asl_skill_model.pkl - Or run the training script:
python train.py
- Place the trained model file as
Start the FastAPI WebSocket server:
uvicorn main:appThe server will run on http://localhost:8000
Collect gesture data for training:
python collect_data.pyTrain ML models on collected data:
python train.pyTest real-time gesture recognition:
python test_gesture_recognition.pyProcess image datasets for landmark extraction:
python extract_landmarks_from_images.py- Feature Extractor (
core/feature_extractor.py): Normalizes MediaPipe hand landmarks to create position-invariant features - Game Logic (
main.py): Handles session management, enemy spawning, collision detection, and WebSocket communication - AI Processing: Real-time analysis of webcam frames for gesture and gaze detection
- Model Training (
train.py): Trains and evaluates ML models using cross-validation
/ws: Main game WebSocket endpoint with session ID parameter for real-time game interaction
- Client sends Base64-encoded webcam frames via WebSocket
- Server processes frames with MediaPipe for hand/face detection
- ASL gesture recognition using trained ML model
- Gaze calculation from face landmarks
- Game logic updates based on AI input
- Server sends full state sync to client
lastvigil-back/
├── main.py # FastAPI WebSocket server and game logic
├── core/
│ └── feature_extractor.py # Landmark normalization and feature extraction
├── models/
│ └── asl_skill_model.pkl # Trained ML model (not included)
├── data/
│ └── gestures.csv # Training data
├── test/
│ └── test_gesture_recognition.py # Real-time testing script
├── collect_data.py # Data collection script
├── extract_landmarks_from_images.py # Image processing script
├── train.py # Model training script
├── requirements.txt # Python dependencies
└── README.md # This file
- fastapi: Web framework for WebSocket server
- mediapipe: Hand and face landmark detection
- opencv-python: Image processing
- scikit-learn: Machine learning models
- numpy: Numerical computations
- pandas: Data manipulation
- joblib: Model serialization
- uvicorn: ASGI server
- Gesture Sequence: Players must match ASL gestures in sequence to cast skills
- Gaze Targeting: Eye gaze determines skill targeting direction
- Wave System: Progressive difficulty with enemy HP/speed increases
- Real-time Processing: 20fps game loop with AI input processing
- Session Management: Independent game sessions for multiple players