Walker

The YOLO model analyzing one of our sample photos for objects.
Example output from scanning a face-cam that will be sent to the text to speech method.

Inspiration

We first thought about this project after seeing an object detection lesson at TJ’s Machine Learning Club. We also noticed that there is a lack of accessibility for disabled people. We decided to expand on this concept by incorporating real-time analysis and implementing text to speech. Our vision is to have people experience vision. We hope Walker can make everyone walk.

What it does

Walker is a program that takes in an image of the surroundings and relays information about it in an audio format. This can be used to help people with weak eyesight or blind people to understand their surroundings.

How we built it

We first used OpenCV and YOLO for object detection to find objects within the camera frame, we then used image analysis to find the depth and location of the object. Based on this information, we created a sentence that was inputted into Google Text to Speech to create an audio file that will be played.

Challenges we ran into

It was hard to collaborate this year especially in an online environment. We tried out software such as Google Colab, but it resulted in technical issues. We solved this by creating a system where one person works on the final code in a virtual environment, and the rest work on `methods and chunks of code to add to the final code. This allowed us to create an efficient system to create the app.

What we learned and what we are proud about

Although we know a bit about Machine Learning, we didn't know much about Computer Vision. We learned about packages such as OpenCV and YOLO and used them to create the application. We are proud of our program because we found a cool idea, and although we didn't know some fo the packages, we learned about it and applied it to the software.

What's next for Walker

We hope to make Walker a fully functional mobile app to allow it to be more accessible to the general public. We also believe that we should implement a cloud computing software such as AWS or Azure to allow the program to be more efficient and not run locally. Another idea we have that we can make it so it can see people or objects behind or above using a 360 camera, and mapping it so the audio of a specific object comes from that direction using spacial audio.

Built With

gtts
numpy
opencv
python
yolo

Submitted to

HackTJ 8.0
- Winner Best Artificial Intelligence/Machine Learning Hack

Created by

I enjoy working on ML/AI, Computer Security, computer vision, and mobile development.

Arnav Jain
Rithvik Reddygari
Hi! I like to work with computer vision, ML/AI, web apps, and other cool stuff!
Vishal Kotha
Hamzah J

Updates

Arnav Jain started this project — Apr 11, 2021 03:54 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.