Inspiration:-
Our inspiration stemmed from the desire to make a meaningful impact on the lives of visually impaired individuals. Witnessing the challenges they face in daily interactions with the environment motivated us to create a solution that promotes independence and accessibility.
What it does:-
Our project, Vision Provider , utilizes smart glasses to capture and interpret the surrounding environment for visually impaired individuals. The glasses provide real-time spoken feedback using features like image capture, speech recognition, and cloud connectivity. This empowers users to engage with the world independently, help a more inclusive experience.
How we built it:-
The core of our solution is a Raspberry Pi acting as the central device. The Pi Camera captures images, while the Speaker and Microphone facilitate audio interactions. The LLAVA Model processes inputs, and Firebase Storage manages image storage. Google Text-to-Speech converts output into spoken responses. We used various tools such as Google Text to Speech, Python, and Firebase for seamless integration.
Challenges we ran into:-
One major challenge was model hosting due to budget constraints. We addressed this by exploring cost-free alternatives, such as the replicate API. Another challenge involved fitting components into the glasses, leading us to use a Raspberry Pi 3 from our school lab due to its availability.
Accomplishments that we're proud of:-
We are proud to have developed a functional prototype that helps reduce the challenge of limited environmental interaction for blind. Our smart glasses provide a tangible solution to enhance independence and accessibility.
What we learned:-
Throughout this journey, we learned to navigate challenges related to budget constraints, hardware limitations..
What's next for Vision Provider:-
The future of Vision AI involves scaling the project by incorporating multilanguage capabilities, optimizing hardware, and enhancing the LLAVA model for improved accuracy. We aim to connect the glasses to mobile devices, enabling additional features like calling and chatting. Real-time text reading capabilities and refining user interfaces are also on our roadmap for a more comprehensive and user-friendly solution.
Built With
- goggle-text-to-speech
- llava
- picam
- raspberry-pi
- recognition
- replicate-api
- speech
Log in or sign up for Devpost to join the conversation.