Inspiration
With over 7 million people in the USA using screen readers, it's surprising how challenging accessibility still is. Many existing screen reader apps provide a subpar experience, often failing to read screens and images accurately. This realization inspired us to create an AI-based screen reader that enhances accessibility by summarizing images, paragraphs, and even chats. Our goal is to empower busy professionals and individuals with visual impairments or learning disabilities, allowing them to engage with information more effectively. Overall, we were inspired by the need to make digital content more accessible for individuals with visual impairments or learning disabilities, enabling them to engage with information more effectively.
What it does
- Capture Screen: Users can capture their screen and receive summarized content.
- Image Descriptions: Users can obtain descriptions of images.
- Voice Activation: Users can activate the extension by saying "capture," "click," or "press" the button.
- Natural Voice Playback: Users can listen to the content read aloud in a natural, human-like voice.
The existing screen reader looks like this: https://youtu.be/pIo2rdYqf34. Ours looks like this: https://youtu.be/kf5yGIl9vkg. As you can see, the voice is much more natural-sounding and detailed summarizations are provided for the images and text.
How we built it
We built the Chrome extension using HTML, CSS, JavaScript, Python, and Node.js. On the backend, we utilize Databricks Open Source, particularly the LangChain LLM library to summarize text and images, providing concise but detailed information. Additionally, we use the Speechify API to ensure the audio output sounds like a human helper. Python and Fast API was utilized for backend processes like generating image descriptions and content summaries and creating our backend API, while Node.js managed real-time data processing and server interactions. This diverse tech stack allowed us to create a robust and accessible tool that enhances user interaction with digital content.
Challenges we ran into
During the development of the ListenUp screen capture feature, we faced challenges specifically related to screen capture permissions in Manifest V3. The new permissions model required us to ensure that all necessary permissions for capturing screens were explicitly defined. This complexity made it challenging to implement the feature effectively while adhering to the updated guidelines.
Accomplishments that we're proud of
We’re proud of our accomplishments in developing the Listen Up screen capture feature, particularly our effective use of various AI tools throughout the process. By leveraging advanced natural language processing capabilities, we created an intuitive voice command system that allows users to activate features seamlessly. Additionally, we utilized AI for image description generation, enabling users to receive detailed explanations of visual content, which enhances accessibility.
What we learned
- Importance of Accessibility: Understanding the various challenges faced by users with disabilities highlighted the need for accessible technology that enhances their digital experiences.
- Integration of Technologies: Effectively combining the APIs with other libraries to create Chrome extension.



Log in or sign up for Devpost to join the conversation.