Skip to content

Releases: MadisonAbilityLab/VRSight

v0.1 - Initial Release

07 Oct 20:02
eea1851

Choose a tag to compare

VRSight: An AI-Driven Scene Description System to Improve Virtual Reality Accessibility for Blind People

VRSight is the first post hoc "3D screen reading" system for virtual reality, enabling blind and low vision users to navigate VR environments through spatial audio feedback without requiring per-app developer integration.

What It Does

  • Real-time Scene Understanding: Detects VR objects using custom YOLO model trained on custom DISCOVR dataset (30 object classes, 67.3% mAP50)
  • Spatial Audio Feedback: 3D positional audio descriptions using depth estimation and Azure SpeechSynthesizer
  • Immersive Tone-Based Descriptions: Customizes description tone to enhance immersion through large-language model and Azure SpeechSynthesizer
  • Four Interaction Modes:
    1. ContextCompass: AI-powered scene descriptions using a large-language model (press 1)
    2. SceneSweep: Left-to-right spatial audio object enumeration (press 2)
    3. AimAssist: Targeted descriptions near hand/controller position (press 3)
    4. SafeGuard: Automatic spatialized, auditory warnings when visual VR guardian displayed (automatic)
  • Real-time Performance: 30+ FPS processing with <2ms latency over websocket

This Release

  • Presents VRSight's codebase as open-source
  • Major code refactoring for maintainability and modularity
  • Minor performance optimizations

Technical Details

Powered by a variety of AI/computer vision models including YOLOv8 object detection, DepthAnythingV2 depth estimation, OpenCV edge detection, multimodal large language models, optical character recognition, and tone-dynamic text-to-speech.

Presented at UIST 2025 (demo + full paper) and CHI 2025 (demo).

Opportunities for Future Work

Seeking contributors! Check the Issues board for things to work on. Intending Release v1.0 as all on-device models and a binary/executable, perhaps also with improved interactions.

Additional Links

Dataset: DISCOVR (17K+ annotated VR images) available on HuggingFace
Paper: ACM Digital Library