Inspiration
In this project, we aim to bridge the accessibility gap for individuals with visual disabilities by AI technology in the field of CV and NLP. Our goal is to create a lightweight AI agent that enhances independence, improves daily interactions, and fosters inclusivity.
What it does
We provide four different modes for the visual disabilities:
- detection mode uses the state-of-arts YOLO model to detect objects in real time. This mode helps users know their surroundings, navigating them safely through the environment, and warning the users when there are barriers within 1m.
- purchase mode is powered by OpenAI API. This allows users to identify the product in front of them along with the estimated price, helping them making better decision in shopping.
- music mode integrates Spotify API, allowing users to choose a song to play by speaking out the name and supporting pause and resume.
- jarvis mode allows users to interactive with our intelligent assistant, jarvis, backended by OpenAI API. He will chat with the users and answer users' questions enthusiastically!
How we built it
- Object detection: YOLO model is used to detect different objects, including person, chair, dining table, and more!
- Voice interaction: With the use of TTS engine pyttsx3 and SpeechRecognition library, the users can talk with our program and enjoy the hand-free experience.
- OpenAI integration: We use the most up-to-date fine-turned model from OpenAI to give users most accurate and smooth interaction with the assistant.
- Google Lens: We integrate Google Cloud API, which help users to get most accurate estimation of the products they are interested in.
Challenges we ran into
While we developed a lot of features, we experience a challenging time when we try to integrate them all together.
Accomplishments that we're proud of
We've learnt a lot of new technologies, including pyttsx3, YOLO and spotipy etc. Furthermore, we are so excited that we can address challenges faced by visual disabilities, creating a better and safer living environment for them.
What we learned
We learnt to use a lot of API, including OpenAI, Google Cloud, Spotify. We also use the TTS and STT libraries. We also learnt how to use YOLO model along with OpenCV.
What's next for Pharos
We may integrate local LLM to protect users' privacy.
Log in or sign up for Devpost to join the conversation.