Inspiration
Our team knows some friends who are visually impaired, so we wanted to make something to keep them safe and help them better understand their environment.
What it does
It takes input images from either the camera or the storage and interpret the content of the image in natural language. It also read the content descriptions out loud for its users.
How we built it
We used Android Studio to implement the UI and convert the result text to speech. Communication between frontend and backend. The backend is implemented using Google App Engine and Vision AI API. We created a Python/Flask REST API that runs on top of App Engine. In side the REST API, we use Vision AI API to convert the picture into keywords. And we constructed a novel algorithm that converts those keywords into actual sentences.
Challenges we ran into
The backend processing time was very long. To improve the processing time we compressed the images for better performance. Converting keywords into sentences took a lot of effort.
Accomplishments that we're proud of
Our backend is completely based on GCP. We wrote an algorithm that select important keywords out of words generated by the Google AI functions from the images. Then our algorithm creates complete sentences that are understandable by human out of these keywords. We built a backend REST API for communication with frontend. We built a UI that can take pictures using device camera and store them in storage to preserve the resolution.
What we learned
We gained better understanding on how to send pictures between front end and back end, and how to compress/scale the pictures. We learned how to create human readable sentences out of keywords. Learned how to use GCP features.
What's next for World Reader
Make UI more user friendly. Implement real-time video interpretation using the basic infrastructure we already built in place for pictures. We would like to improve the quality of the generated sentences to include more precise information such as location and motion.
Built With
- android-studio
- flask
- gcp
- google-app-engine
- java
- json
- python
- tts
- visionai

Log in or sign up for Devpost to join the conversation.