VRSight: An AI-Driven Scene Description System to Improve Virtual Reality Accessibility for Blind People

VRSight is the first post hoc "3D screen reading" system for virtual reality, enabling blind and low vision users to navigate VR environments through spatial audio feedback without requiring per-app developer integration.

What It Does

Real-time Scene Understanding: Detects VR objects using custom YOLO model trained on custom DISCOVR dataset (30 object classes, 67.3% mAP50)
Spatial Audio Feedback: 3D positional audio descriptions using depth estimation and Azure SpeechSynthesizer
Immersive Tone-Based Descriptions: Customizes description tone to enhance immersion through large-language model and Azure SpeechSynthesizer
Four Interaction Modes:
1. ContextCompass: AI-powered scene descriptions using a large-language model (press 1)
2. SceneSweep: Left-to-right spatial audio object enumeration (press 2)
3. AimAssist: Targeted descriptions near hand/controller position (press 3)
4. SafeGuard: Automatic spatialized, auditory warnings when visual VR guardian displayed (automatic)
Real-time Performance: 30+ FPS processing with <2ms latency over websocket

This Release

Presents VRSight's codebase as open-source
Major code refactoring for maintainability and modularity
Minor performance optimizations

Technical Details

Powered by a variety of AI/computer vision models including YOLOv8 object detection, DepthAnythingV2 depth estimation, OpenCV edge detection, multimodal large language models, optical character recognition, and tone-dynamic text-to-speech.

Presented at UIST 2025 (demo + full paper) and CHI 2025 (demo).

Opportunities for Future Work

Seeking contributors! Check the Issues board for things to work on. Intending Release v1.0 as all on-device models and a binary/executable, perhaps also with improved interactions.

Additional Links

Dataset: DISCOVR (17K+ annotated VR images) available on HuggingFace
Paper: ACM Digital Library

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

VRSight: An AI-Driven Scene Description System to Improve Virtual Reality Accessibility for Blind People

What It Does

This Release

Technical Details

Opportunities for Future Work

Additional Links

Uh oh!

Releases: MadisonAbilityLab/VRSight

v0.1 - Initial Release

VRSight: An AI-Driven Scene Description System to Improve Virtual Reality Accessibility for Blind People

What It Does

This Release

Technical Details

Opportunities for Future Work

Additional Links

Uh oh!