Inspiration
We were drawn to the Accessibility track because it got us thinking about how much of a user’s experience on the Internet is driven by being able to interact with and understand the visual and auditory elements of a webpage. After researching current assistive technologies, we learned about screen readers; they render text and images as speech output to help users with vision impairments. After testing current extensions, we saw that many relied on alt text with images to translate visual media to an auditory format. However, alt text isn’t consistently reliable and can return metadata about the image instead of its content. Accordingly, we wanted to try to make an image caption generator extension that would generate captions/descriptions for images on a webpage that a screen reader could use in a similar manner as alt text.
What it does
When the extension is enabled, it takes the images on a webpage and generates captions for them using our model, replacing those images with the captions on the website in a format parseable by screen readers.
How we built it
As our extension would be dealing with existing webpages, it would work with a Javascript front-end. After exploring options with the OpenAI API and libraries like LAVIS, we decided to use a Huggingface-trained repository trained using the Flicker8k image dataset, which includes images and captions.
Challenges we ran into
We ran into a lot of issues at the end getting the server and API to communicate with each other.
Accomplishments that we're proud of
We’re proud of getting a functional project in the end.
What we learned
We all learned new things from how to format API calls, train AI models, and overall how to proceed in the engineering process from ideation to creation.
What's next for Caption Generator for Screen Reader
We hope to polish the extension and release it for free on the Chrome Web Store!
Built With
- api
- huggingface
Log in or sign up for Devpost to join the conversation.