World Reader

test
Example
Example

Inspiration

Our team knows some friends who are visually impaired, so we wanted to make something to keep them safe and help them better understand their environment.

What it does

It takes input images from either the camera or the storage and interpret the content of the image in natural language. It also read the content descriptions out loud for its users.

How we built it

We used Android Studio to implement the UI and convert the result text to speech. Communication between frontend and backend. The backend is implemented using Google App Engine and Vision AI API. We created a Python/Flask REST API that runs on top of App Engine. In side the REST API, we use Vision AI API to convert the picture into keywords. And we constructed a novel algorithm that converts those keywords into actual sentences.

Challenges we ran into

The backend processing time was very long. To improve the processing time we compressed the images for better performance. Converting keywords into sentences took a lot of effort.

Accomplishments that we're proud of

Our backend is completely based on GCP. We wrote an algorithm that select important keywords out of words generated by the Google AI functions from the images. Then our algorithm creates complete sentences that are understandable by human out of these keywords. We built a backend REST API for communication with frontend. We built a UI that can take pictures using device camera and store them in storage to preserve the resolution.

What we learned

We gained better understanding on how to send pictures between front end and back end, and how to compress/scale the pictures. We learned how to create human readable sentences out of keywords. Learned how to use GCP features.

What's next for World Reader

Make UI more user friendly. Implement real-time video interpretation using the basic infrastructure we already built in place for pictures. We would like to improve the quality of the generated sentences to include more precise information such as location and motion.