-
-
LOGO AURA VISION
-
Aura Vision also reads text embedded in images using OCR, unlocking crucial information from banners, infographics, and memes.
-
Hover over any image to get an instant, AI-powered description. Understand the context of news, articles, and blogs like never before.
-
Customize your experience. Choose your preferred language (English, Spanish, or Portuguese) and adjust the voice to your liking.
Inspiration
The World Health Organization estimates 2.2 billion people live with some form of visual impairment. For them, the modern web is filled with "information black holes." But the barriers don't stop there. Users with reading difficulties like dyslexia face walls of text, while language differences exclude billions more. The result is a fractured, inaccessible digital experience.
Our inspiration was to build a single, intelligent tool to tear down all these walls at once. We asked: how can we use AI to create a unified accessibility layer for the entire internet, ensuring anyone can not just access, but truly understand digital content, regardless of ability or language?
What it Does
Aura Vision is a comprehensive, AI-powered accessibility and productivity suite built as a Google Chrome Extension. It transforms how users interact with web content through a seamless, integrated experience:
AI-Powered Reader Mode & Summarization: With a single click, Aura Vision activates its Reader Mode. Using Mozilla's Readability.js, it cleans away ads and clutter. It then sends the article's text to the Gemini API to generate a concise summary, which is translated into the user's preferred language and spoken aloud. This turns a 20-minute read into a 1-minute insight.
Advanced Image Description: By hovering over any image, users get an instant, rich description from the Google Gemini API. The system is smart enough to use Tesseract.js OCR to also read and include any text found within the image, making banners and infographics fully accessible.
Real-Time Text Processing: Selecting any text on a page automatically triggers the AI to detect the source language, translate it if necessary, and read the final text aloud in a clear, natural voice.
Full User Control: A sleek, modern settings panel allows users to customize their experience, including language, voice speed and pitch, and enabling or disabling automatic features.
Impact and Social Benefit
Aura Vision directly addresses UN Sustainable Development Goal #10 (Reduced Inequalities) by empowering people with disabilities. Our tool provides a tangible solution to digital exclusion, giving users with visual impairments or reading difficulties the autonomy to consume online information. By summarizing and translating content, it also acts as a powerful learning accelerator for students and professionals, promoting equal access to education, healthcare information, and global knowledge.
Uniqueness
While other tools exist, they are fragmented. Aura Vision's innovation lies in its seamless, all-in-one integration:
1 - vs. Traditional Screen Readers: They are blind to images without alt-text. Aura Vision acts as the "eyes" for these tools.
2 - vs. Reader Mode Apps (like Pocket): They clean up articles but lack the built-in AI summarization and translation capabilities.
3 - vs. Standalone Translation Tools: They require a clunky, manual process of copying and pasting. Aura Vision makes translation an invisible, automatic part of the reading experience.
Our solution combines three separate product categories into one intuitive tool.
How We Built It & Architecture
Aura Vision is built on a modern, robust architecture. A content.js script, augmented with Readability.js and Tesseract.js, handles all on-page interactions. It sends messages to a background.js service worker, which acts as the central orchestrator. For AI tasks, the background script performs secure fetch calls to the Google Gemini 1.5 Flash API. The JSON response is parsed and sent to the native Chrome TTS API. All user settings are managed via the Chrome Storage API and controlled through a custom-built UI in the extension's popup.
Challenges We Faced
Our primary challenge was a strategic pivot. We initially aimed to use the on-device Gemini Nano API. However, we encountered a persistent platform bug in pre-release Chrome versions (No On-Device Feature Used state). This forced us to re-architect our solution in real-time to use a more reliable and powerful cloud-based API. This taught us invaluable lessons in creating resilient, production-ready applications that prioritize the user over the initial technical plan.
What We Learned
This project was a masterclass in Universal Design. We learned that by solving for a specific accessibility need, we ended up building a powerful productivity tool that benefits everyone. A student can use the summarizer to study faster, and a professional can use the translator to understand international reports. We learned that the best technology doesn't just grant access; it enhances understanding for all.
What's Next for Aura Vision (Our Grand Vision)
Our vision is to evolve Aura Vision from an accessibility tool into a full-fledged "comprehension engine" for the web. Our roadmap includes:
1 - "Converse with the Page": Allowing users to select content and ask the AI direct questions, like "Explain this paragraph more simply."
2 - Aura Vision for Developers: A new mode that audits websites for accessibility issues and uses AI to automatically generate alt text suggestions, helping fix the web at its source.
3 - Augmented Memory: A feature that allows users to save and tag insights, with the AI proactively resurfacing relevant saved notes as they browse new sites.
Built With
- accessibility
- chrome
- gemini-api
- google-chrome-extension-apis
- google-cloud
- html/css
- javascript
- ocr
- text-to-speech
- translation
- tts)




Log in or sign up for Devpost to join the conversation.