Sightsense

SightSense
Initial Download
Loading Screen
App Tutorial
User Interface

Inspiration

Our friends Joe and Mike, who are visually impaired, inspired SightSense. Witnessing their daily struggles with navigating environments and accessing visual information motivated us to create a comprehensive AI-powered assistant to enhance their independence and quality of life.

What it does

SightSense is an AI vision assistant for the visually impaired. It offers object detection and location guidance, scene description, text reading, and real-time assistance. Users can vocally interact with the app to get information about their surroundings and locate specific objects.

How we built it

We developed SightSense using FastAPI for the backend, integrating various AI models for computer vision (YOLO, MediaPipe), natural language processing (SentenceTransformer, GPT-4 Vision), and OCR (EasyOCR). We combined these technologies to create a seamless experience triggered by voice commands.

Challenges

Learning Swift for iOS app development
Integrating multiple AI models efficiently
Optimizing for real-time performance on mobile devices
Designing an intuitive voice-based user interface

Accomplishments

Successfully integrated complex AI models into a mobile app
Created a user-friendly interface for visually impaired users
Developed a system for guiding hand movements to objects

What we learned

Mobile app development with Swift
AI model integration and optimization
Accessibility design principles
Collaborative problem-solving in a hackathon environment

What's next

Improve accuracy and speed of object detection
Expand language support for global accessibility
Develop Android version of the app
Incorporate user feedback for feature enhancements
Explore partnerships with organizations for the visually impaired

For the detailed project report visit link

Built With

cv2)
depth-anything
easy-ocr
easyocr
fastapi
google
gpt-4
gtts
huggingface
mediapipe
numpy
openai
opencv
pillow
python
pytorch
swift
text-to-speech)
ultralytics
vision
yolov8)

Submitted to

DeltaHacks XI

Created by

Worked on the Text recognition and surrounding image description in the backend

Khush Patel
Worked on Backend and applying machine learning models to camera footage

Adem Cehajic
Most of the Swift development

Aiden Ly
An aspiring programmer. The indomitable human spirit never loses.
Sammy Tourani
Software Engineer @McMaster | PM @AtkinsRéalis | Prev. @FM-Engineering & Outlier | Ambassador @Microsoft | Dean’s List Recipient

Updates

Khush Patel started this project — Jan 12, 2025 08:28 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.