Caption Generator for Screen Reader

Inspiration

We were drawn to the Accessibility track because it got us thinking about how much of a user’s experience on the Internet is driven by being able to interact with and understand the visual and auditory elements of a webpage. After researching current assistive technologies, we learned about screen readers; they render text and images as speech output to help users with vision impairments. After testing current extensions, we saw that many relied on alt text with images to translate visual media to an auditory format. However, alt text isn’t consistently reliable and can return metadata about the image instead of its content. Accordingly, we wanted to try to make an image caption generator extension that would generate captions/descriptions for images on a webpage that a screen reader could use in a similar manner as alt text.

What it does

When the extension is enabled, it takes the images on a webpage and generates captions for them using our model, replacing those images with the captions on the website in a format parseable by screen readers.

How we built it

As our extension would be dealing with existing webpages, it would work with a Javascript front-end. After exploring options with the OpenAI API and libraries like LAVIS, we decided to use a Huggingface-trained repository trained using the Flicker8k image dataset, which includes images and captions.

Challenges we ran into

We ran into a lot of issues at the end getting the server and API to communicate with each other.

Accomplishments that we're proud of

We’re proud of getting a functional project in the end.

What we learned

We all learned new things from how to format API calls, train AI models, and overall how to proceed in the engineering process from ideation to creation.

What's next for Caption Generator for Screen Reader

We hope to polish the extension and release it for free on the Chrome Web Store!

Built With

api
huggingface

Submitted to

WaffleHacks 2023

Created by

I worked on the backend using NodeJS, and ExpressJS. I was able to process to link HTTP requests from the chrome extension to the image caption generation model, and return a caption to the front end in JSON format.

Alan Tao
software shitposting
m-skaan Gupta
Devin Zhang
Evan Garber

Updates

m-skaan Gupta started this project — Jun 25, 2023 11:59 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.