Skip to content

Ryanqu27/SASEHacks-Visi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VISI

A real-time artificial intelligence vision and gesture assistant. The system integrates computer vision with Large Language Models (LLMs) to provide an interactive, context-aware environment for visually impaired individuals. It detects hand gestures, recognizes faces, and uses AI to describe the surroundings and provide audio feedback, making it a powerful tool for those with visual impairments.

Features

  • Real-Time Gesture Pipeline: Detects complex hand gestures using MediaPipe to trigger system events.
  • Face & Gaze Tracking: Identifies registered faces and tracks user gaze to understand attention.
  • Multimodal AI Integration:
    • Uses Google Gemini to describe scenes or interact with the user (e.g., pointing triggers a compliment).
    • Uses ElevenLabs for natural-sounding Text-to-Speech (TTS) notifications.
  • Sci-Fi HUD & Streaming: Draws an interactive HUD using OpenCV and streams the frames to a web client.
  • Modern Web Interface ("Blinded"): A fast, responsive frontend built with React, Vite, and TypeScript.

🛠 Tech Stack

Backend

  • Python
  • OpenCV & MediaPipe (Computer Vision)
  • Google Gemini API (Scene Understanding)
  • ElevenLabs API (Text-to-Speech)

Frontend ("Blinded")

  • React 19
  • TypeScript
  • Vite

🚀 Getting Started

Prerequisites

  • Python 3.10+
  • Node.js & npm (for the frontend)
  • API Keys for Gemini and ElevenLabs (configure in .env within the Backend/ directory based on .example.env)

Running the Backend

  1. Navigate to the backend directory:
    cd Backend
  2. Create and activate a virtual environment (optional but recommended):
    # Windows
    python -m venv venv
    .\venv\Scripts\activate
  3. Install dependencies:
    pip install -r requirements.txt
  4. Start the vision system:
    python main.py
    Note: Ensure your webcam is connected. Press q in the preview window to quit, e to toggle ElevenLabs TTS, and t to toggle Gemini.

Running the Frontend

  1. Navigate to the frontend directory:
    cd Blinded
  2. Install dependencies:
    npm install
  3. Start the development server:
    npm run dev

⌨️ Controls (Backend HUD)

While the OpenCV window is active, you can use the following keybinds:

  • Space: Capture a face sample during registration
  • n: Start the face registration process
  • e: Toggle ElevenLabs Text-to-Speech
  • t: Toggle Gemini Vision analysis
  • q: Quit the pipeline

About

Visi is an application to help visually impaired individuals navigate through awkward social interactions with an AI-powered camera to detect hand gestures and surroundings.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors