A real-time artificial intelligence vision and gesture assistant. The system integrates computer vision with Large Language Models (LLMs) to provide an interactive, context-aware environment for visually impaired individuals. It detects hand gestures, recognizes faces, and uses AI to describe the surroundings and provide audio feedback, making it a powerful tool for those with visual impairments.
- Real-Time Gesture Pipeline: Detects complex hand gestures using MediaPipe to trigger system events.
- Face & Gaze Tracking: Identifies registered faces and tracks user gaze to understand attention.
- Multimodal AI Integration:
- Uses Google Gemini to describe scenes or interact with the user (e.g., pointing triggers a compliment).
- Uses ElevenLabs for natural-sounding Text-to-Speech (TTS) notifications.
- Sci-Fi HUD & Streaming: Draws an interactive HUD using OpenCV and streams the frames to a web client.
- Modern Web Interface ("Blinded"): A fast, responsive frontend built with React, Vite, and TypeScript.
- Python
- OpenCV & MediaPipe (Computer Vision)
- Google Gemini API (Scene Understanding)
- ElevenLabs API (Text-to-Speech)
- React 19
- TypeScript
- Vite
- Python 3.10+
- Node.js & npm (for the frontend)
- API Keys for Gemini and ElevenLabs (configure in
.envwithin theBackend/directory based on.example.env)
- Navigate to the backend directory:
cd Backend - Create and activate a virtual environment (optional but recommended):
# Windows python -m venv venv .\venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Start the vision system:
Note: Ensure your webcam is connected. Press
python main.py
qin the preview window to quit,eto toggle ElevenLabs TTS, andtto toggle Gemini.
- Navigate to the frontend directory:
cd Blinded - Install dependencies:
npm install
- Start the development server:
npm run dev
While the OpenCV window is active, you can use the following keybinds:
Space: Capture a face sample during registrationn: Start the face registration processe: Toggle ElevenLabs Text-to-Speecht: Toggle Gemini Vision analysisq: Quit the pipeline