Inspiration
Despite all the incredible advancements in technology, we're still stuck with this massive digital gap between people with disabilities and the rest of us. It's like, seriously? Take blind individuals, for instance—they rely solely on screen readers to surf the web. These screen readers read images by looking at their alt text. And get this: 61% of homepage accessibility errors come down to missing alt text. Can you believe it? It's like we're taking one step forward and two steps back!
In this day and age, shouldn't we be bridging the gap, not widening it? But nope, here we are, still depending on developers to make websites accessible. And let's be real—not all developers are stepping up to the plate when prioritizing alt text.
But you know what? We've had enough. We're sick and tired of seeing websites leaving many users in the cold. We're firm believers that everyone—absolutely everyone—should have equal access to the web. So, we're on a mission to shout from the rooftops about accessibility. Because, seriously, technology should be a force for empowerment, not exclusion.
What it does
Imagine browsing the web easily, knowing every image has a description tailored just for you. That's the power of our Chrome extension. With a simple click, our extension uses cutting-edge Generative AI to generate alt text for all the images on any website you visit. Suddenly, every website becomes accessible-friendly, ensuring no one is left out.
But we didn't stop there. Currently, we're working on the ability for users to hover over an image, click a button and verbally ask for more detailed descriptions. It's like having a personal assistant to provide additional context beyond the alt text. And guess what? Your screen reader will respond with the information you're curious about. Once satisfied, your screen reader seamlessly continues its duties, making your browsing experience smoother and more informative.
Our extension goes beyond basic alt text generation by leveraging the capabilities of the Google Gemini API for comprehensive image analysis. By interfacing with Gemini's advanced image recognition capabilities, we provide users with rich, contextually relevant image descriptions tailored to their browsing needs.
Alt+Ctrl+View is developed as a Chrome extension, aligning seamlessly with the Google Chrome ecosystem. Integrating directly into the browser ensures a seamless and intuitive user experience for all Chrome users, further enhancing accessibility and usability.
How we built it
The Chrome extension scans the website for all images and sends the image URLs to a Google Cloud function via a POST message. The Google Cloud function stores the image in a cloud storage bucket and sends it to Gemini for scanning and describing. The description given by Gemini is sent back to the Google Cloud extension as a response. HTML injection is done to replace the alt texts of all images on the screen with the corresponding description. Image processing and model API call testing were done on Jupyter Notebook instances built on Google Cloud.
Challenges we ran into
One of the significant challenges we encountered during the development process was a CORS (Cross-Origin Resource Sharing) bug with Google Cloud. CORS issues can be tricky to debug and resolve as they involve interactions between domains and servers.
This CORS bug hindered our progress and caused unexpected behaviour when interacting with Google Cloud services from our Chrome extension. Debugging and troubleshooting the issue required a thorough investigation into the configuration of our extension and the settings of the Google Cloud services.
Accomplishments that we're proud of
We're incredibly proud of the idea we've brought to life. In a world where accessibility is often overlooked, we recognized a crucial need and took action to address it head-on. Our vision for a Chrome extension that leverages Generative AI to generate alt text for images on any website is innovative and impactful.
What we learned
It's easy to overlook seemingly simple problems, assuming they don't require advanced technology. But the truth is, even the most basic issues, like ensuring every image has a description, can profoundly impact accessibility. Our journey taught us the importance of addressing these seemingly minor challenges and leveraging technology to create meaningful solutions.
What's next for Alt+Ctrl+View
Firstly, we'll focus on enhancing the capabilities of our Generative AI to provide even more accurate and descriptive alt text for images. Secondly, user feedback will play a crucial role in our development process. We'll actively seek user feedback to understand their needs and preferences better. This feedback loop will ensure that our extension evolves to meet their expectations and addresses any pain points they encounter. Integration with screen readers is another priority. We'll work on seamless integration with popular screen readers, making it even easier for users to access detailed image descriptions. This integration will ensure a smoother experience for individuals with visual impairments.
Built With
- chrome
- gemini
- google-cloud
- google-speechrecognition
- html5
- javascript
- python
- vertexai
Log in or sign up for Devpost to join the conversation.