Inspiration

Growing up, we witnessed family members and friends struggle with everyday digital tasks. These struggles were due to a plethora of reasons, such as arthritis. (We found out that ~58.5 million adults in the United States or ~25.7% of the population have a form of arthritis while attending this hackathon!)

What it does

R.U.P./R.T. (Rutgers University Personal Responsive Tool), or we like to call him Rupert, transforms any Chrome browser into an intelligent, voice-controlled assistant that understands natural language commands and executes them seamlessly.

Wake Word Activation - Simply say "Hey Rupert" from any webpage, and the extension instantly listens for your command. The system uses advanced speech recognition with continuous background listening across all browser tabs.

Natural Language Understanding - Powered by Google Gemini AI, Rupert interprets complex commands like "switch to my YouTube tab," "click the search button," or "scroll down to the comments." No rigid syntax required—just speak naturally.

Intelligent Element Interaction - The extension automatically identifies and numbers clickable elements on any webpage. Say "show numbers" to see interactive overlays, then "click number 5" to activate that element. This works on videos, buttons, links, forms, and more.

Cross-Tab Navigation - Manage your entire browsing session with voice. Commands like "go to Amazon," "close tab 2," or "create new tab" give you complete control without touching the keyboard.

Smart Text Input - Dictate text directly into forms and search boxes with commands like "type hello world" or "search for dogs." The extension handles proper event triggering for compatibility with modern web frameworks.

Real-Time Visual Feedback - Dynamic overlays show listening status, command processing, and action confirmation. Smooth animations and color-coded indicators provide instant feedback without disrupting the browsing experience.

How we built it

We built Rupert using the Chrome Manifest V3 framework with proper permissions for tabs, storage, and microphone access. The audio system uses two Web Speech API instances: one for wake word detection with optimized parameters for minimal resource usage, and another for high-accuracy audio capture with echo cancellation and noise suppression. We integrated Google Gemini AI to transform natural speech into structured commands. Content scripts are injected into every webpage to enable interaction. We developed sophisticated element detection using multiple selector strategies, from standard HTML elements to site-specific patterns such as YouTube.

Challenges we ran into

Users speak naturally, not in a command syntax. Phrases like "click the thing at the top" or "play the video to the left" required contextual understanding beyond simple pattern matching. We sent browser state (open tabs, page titles, DOM structure) to Gemini AI for interpretation, but this increased API call complexity and latency. Balancing response speed against interpretation accuracy became a constant tradeoff, and solving problems sometimes would create more problems.

Accomplishments that we're proud of

We are proud of being able to deploy Gemini and all our frameworks into a usable web extension. We are proud of being able to create an accessible technology that allows for more people to be able to use the internet.

What we learned

We learned for the first time how to create a chrome web extension!

What's next for R.U.P./R.T.

Multi-Language Support - We want to expand beyond English with localized wake words and commands. International accessibility is crucial, and we want Rupert to work seamlessly in Spanish, Mandarin, Hindi, and more.

Accessibility Partnerships - Collaborating with organizations supporting people with disabilities to refine features based on real-world needs. We want feedback from users who benefit most from voice control.

Built With

Share this project:

Updates