VISI

A real-time artificial intelligence vision and gesture assistant. The system integrates computer vision with Large Language Models (LLMs) to provide an interactive, context-aware environment for visually impaired individuals. It detects hand gestures, recognizes faces, and uses AI to describe the surroundings and provide audio feedback, making it a powerful tool for those with visual impairments.

Features

Real-Time Gesture Pipeline: Detects complex hand gestures using MediaPipe to trigger system events.
Face & Gaze Tracking: Identifies registered faces and tracks user gaze to understand attention.
Multimodal AI Integration:
- Uses Google Gemini to describe scenes or interact with the user (e.g., pointing triggers a compliment).
- Uses ElevenLabs for natural-sounding Text-to-Speech (TTS) notifications.
Sci-Fi HUD & Streaming: Draws an interactive HUD using OpenCV and streams the frames to a web client.
Modern Web Interface ("Blinded"): A fast, responsive frontend built with React, Vite, and TypeScript.

🛠 Tech Stack

Backend

Python
OpenCV & MediaPipe (Computer Vision)
Google Gemini API (Scene Understanding)
ElevenLabs API (Text-to-Speech)

Frontend ("Blinded")

React 19
TypeScript
Vite

🚀 Getting Started

Prerequisites

Python 3.10+
Node.js & npm (for the frontend)
API Keys for Gemini and ElevenLabs (configure in .env within the Backend/ directory based on .example.env)

Running the Backend

Navigate to the backend directory:
```
cd Backend
```
Create and activate a virtual environment (optional but recommended):
```
# Windows
python -m venv venv
.\venv\Scripts\activate
```
Install dependencies:
```
pip install -r requirements.txt
```
Start the vision system:
```
python main.py
```
Note: Ensure your webcam is connected. Press q in the preview window to quit, e to toggle ElevenLabs TTS, and t to toggle Gemini.

Running the Frontend

Navigate to the frontend directory:
```
cd Blinded
```
Install dependencies:
```
npm install
```
Start the development server:
```
npm run dev
```

⌨️ Controls (Backend HUD)

While the OpenCV window is active, you can use the following keybinds:

Space: Capture a face sample during registration
n: Start the face registration process
e: Toggle ElevenLabs Text-to-Speech
t: Toggle Gemini Vision analysis
q: Quit the pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.claude		.claude
.vscode		.vscode
Backend		Backend
Blinded		Blinded
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VISI

Features

🛠 Tech Stack

Backend

Frontend ("Blinded")

🚀 Getting Started

Prerequisites

Running the Backend

Running the Frontend

⌨️ Controls (Backend HUD)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VISI

Features

🛠 Tech Stack

Backend

Frontend ("Blinded")

🚀 Getting Started

Prerequisites

Running the Backend

Running the Frontend

⌨️ Controls (Backend HUD)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages