GitHub - jealmonte/rupert: Our official submission to HackRU 2025, by Joshua Almonte, Kenneth Li, and James Owen

R.U.P./R.T Voice Assistant

Intelligent AI-powered accessibility for everyone

Chrome Extension with Voice Command Integration

Table of Contents

About The Project
- Built With
Technologies

About The Project

R.U.P./R.T is a sophisticated Chrome extension that transforms web browsing through intelligent voice commands and AI-powered automation. The extension provides hands-free control over browser navigation, tab management, and webpage interaction using natural language processing.

Key Features

Wake Word Detection - Activates with "Hey Rupert" voice commands across all browser tabs
Intelligent Command Processing - Uses Google Gemini AI to interpret natural language commands and convert them to precise browser actions
Cross-Tab Voice Control - Seamlessly operates across multiple browser tabs and windows
Smart Element Interaction - Automatically numbers clickable elements and enables voice-controlled clicking
Advanced Tab Management - Voice commands for switching, closing, and creating tabs
Real-Time Visual Feedback - Dynamic indicators show listening status and command processing

Voice Commands Supported

"Switch to tab 3" - Navigate between open tabs
"Click number 5" - Interact with numbered page elements
"Scroll down" - Page navigation
"Google search dogs" - Automated web searches
"Type hello world" - Text input automation
"Close tab 2" - Tab management

(back to top)

Built With

The extension leverages modern web technologies and Chrome's powerful extension framework:

(back to top)

Technologies

Chrome Extensions API

The extension is built on Chrome's Manifest V3 architecture, utilizing the latest service worker model for background processing. The comprehensive manifest.json defines permissions for tabs, activeTab, storage, and scripting APIs. Host permissions enable cross-origin requests to the Google Gemini API, while web accessible resources expose the permission management interface to users.

The extension maintains persistent state through Chrome's local storage and coordinates between background service workers and content scripts for seamless operation across browser sessions.

Google Gemini AI

Google Gemini serves as the intelligent core of the extension, transforming natural speech into structured commands. The system sends voice transcripts along with current browser context (open tabs, page information) to Gemini's generative AI model.

The AI processes commands using sophisticated prompting that includes available browser tabs, returning structured JSON responses with action types, confidence scores, and execution parameters. This enables the extension to understand complex commands like "close the YouTube tab" or "click on the search button" with high accuracy.

Web Speech API

The extension implements a dual speech recognition system using the browser's native Web Speech API. The WakeWordDetector class maintains continuous listening for trigger phrases like "Hey Rupert" across all tabs, using optimized recognition parameters for minimal resource usage.

Once activated, a separate SpeechRecognition instance captures the actual voice command with enhanced accuracy settings including echo cancellation and noise suppression. This two-tier approach balances always-on availability with precise command capture.

Service Workers

The background.js file implements a comprehensive service worker that orchestrates the entire extension ecosystem. It manages extension state, coordinates cross-tab communication, handles AI command processing, and maintains persistent wake word detection.

The service worker uses sophisticated message passing to communicate with content scripts, manages tab lifecycle events, and implements keep-alive mechanisms to prevent Chrome from terminating background processes during active use.

Content Scripts

Content scripts provide the interactive layer between the extension and web pages. The system includes intelligent element detection that identifies clickable elements using multiple selector strategies, from standard HTML elements to site-specific patterns for major platforms like YouTube, GitHub, and social media sites.

Visual feedback systems display numbered badges on interactive elements and show real-time listening indicators with smooth CSS animations. The content scripts handle natural text input simulation with proper event firing to ensure compatibility with modern JavaScript frameworks like React and Vue.

Chrome Storage

State management utilizes Chrome's local storage API for persistent configuration and extension settings. The ConfigManager class provides environment-aware configuration loading, supporting both development and production deployments.

Settings include wake word sensitivity, visual feedback preferences, language selection, and notification controls. The storage system implements automatic fallbacks and error recovery to ensure extension reliability across browser restarts and updates.

DOM Manipulation

Advanced DOM manipulation enables precise webpage interaction through programmatic element identification and interaction simulation. The system uses sophisticated element filtering to identify truly clickable elements, avoiding hidden or disabled components.

Text input simulation includes character-by-character typing with realistic delays, comprehensive event firing (keydown, keyup, input, change), and framework compatibility layers. The visual feedback system injects custom CSS animations and overlays without interfering with existing page functionality.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.envtemp		.envtemp
.gitignore		.gitignore
README.md		README.md
background.js		background.js
command-overlay.css		command-overlay.css
command-overlay.html		command-overlay.html
command-overlay.js		command-overlay.js
config.js		config.js
content.js		content.js
enhanced_command_processor.js		enhanced_command_processor.js
enhanced_content_script.js		enhanced_content_script.js
enhanced_speech_recognition_config.js		enhanced_speech_recognition_config.js
gemini_voice_assistant.js		gemini_voice_assistant.js
icon.png		icon.png
icon_128.png		icon_128.png
icon_16.png		icon_16.png
icon_32.png		icon_32.png
icon_48.png		icon_48.png
manifest.json		manifest.json
popup.html		popup.html
popup.js		popup.js
styles.css		styles.css
wake-word-detector.js		wake-word-detector.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R.U.P./R.T Voice Assistant

About The Project

Built With

Technologies

Chrome Extensions API

Google Gemini AI

Web Speech API

Service Workers

Content Scripts

Chrome Storage

DOM Manipulation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

R.U.P./R.T Voice Assistant

About The Project

Built With

Technologies

Chrome Extensions API

Google Gemini AI

Web Speech API

Service Workers

Content Scripts

Chrome Storage

DOM Manipulation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages