πΊ Watch Demo Video | π Hackathon Submission
Ultron is a sleek, fast, and powerful AI assistant that lives in your browser. Talk to it, give it commands, and it gets things done β opening websites, answering questions, reading files, and more. Built on Google Gemini, it's smart, responsive, and completely yours to run locally.
Prerequisites: You need Node.js installed and a free Gemini API key.
1. Clone the repository
git clone <your-repo-url>
cd Live-agent-Ultron2. Add your API key
Open the .env file in the root folder and paste your key:
GEMINI_API_KEY=your_key_here3. Install dependencies
npm install4. Launch the app
npm run dev5. Open in your browser
Ctrl + Click the link in the terminal, or go to http://localhost:3000.
That's it. You're in.
Once the app is running, follow these steps to verify its core capabilities:
- Action: Type "Hello, who are you?" in the chat box and press Enter.
- Expectation: Ultron should reply word-by-word (streaming) and explain its role as your assistant.
- Action: Type "Open the place where I watch videos" and press Enter.
- Expectation: A new tab should open to YouTube.com. Ultron should confirm the action in the chat.
- Action: Go to the Commands section (top right sidebar).
- Action: Add Keyword:
hamburger, URL:instagram.com, Label:Instagram. - Action: Go back to chat and type "hamburger".
- Expectation: Instagram should open instantly without a delay, bypassing the AI intent mapping.
- Action: Click the Camera icon.
- Action: Hold an object (like a phone or a pen) in front of your webcam and click the blue shutter button.
- Expectation: Ultron will analyze the "Vision" snapshot and describe exactly what it sees.
- Action: Click the Paperclip icon and upload a small
.txtor.jsfile. - Action: Send a message like "Summarize this file".
- Expectation: Ultron will read the file content and provide a summary.
- Action: Click the Microphone icon (allow browser permissions).
- Action: Speak a message, wait for it to transcribe, and click send.
- Action: Ensure your volume is up to hear Ultron's Voice Output.
- Action: Type "My name is [name] and I am [age] years old" and press Enter.
- Action: Then later type "What is my name and how old am I?" and press Enter.
- Expectation: Ultron will go through the chat history and give your name and age.
Talk to Ultron just like you'd talk to a person. It understands natural language, answers questions, helps you write, explains concepts, and much more β all powered by Google Gemini.
Tell Ultron to open a website and it'll do it instantly. Say things like:
- "Open YouTube"
- "Open the place where I watch reels"
- "Please open my Instagram"
Ultron understands what you mean, not just what you say.
Create your own personal shortcuts. Map any word to any website β for example, set "Hamburger" to open Instagram. Once saved, just type or say the keyword and Ultron opens it immediately, no AI call needed.
Attach a file to your message and Ultron will read and analyze it for you. Ask questions about its contents, get summaries, or extract specific information β all without leaving the chat.
Ultron can now see! Click the camera icon to take a real-time snapshot. Point your camera at anything, and Ultron will analyze the image and describe it to you using its multimodal vision capabilities.
Click the microphone button and speak your message. Ultron transcribes your speech into the input box so you can review it before sending, or fire it off hands-free.
Ultron reads its responses aloud using your browser's built-in text-to-speech. Hear the reply while you multitask β only the actual response is read, never any control commands.
Ultron types its response in real time, word by word, just like a person typing β so you never stare at a blank screen waiting. A "Thinking..." animation appears the moment you send a message so you always know it's working.
Hover over any message β yours or Ultron's β and a clipboard icon appears in the corner. Click it to copy the text instantly. The icon turns green to confirm the copy.
Everything works without lifting your hands from the keyboard:
Entersends your message in the chatEnterin the Keyword field moves you to the URL fieldEnterin the URL field saves the custom keywordEnteron the login page signs you in
Each user gets their own private chat history, saved automatically. Come back later and your conversation is right where you left it. Clear it any time with a single click.
Ultron supports multiple users on the same machine. Each person's history, custom keywords, and settings are completely separate from everyone else's. Just enter your email to switch accounts.
Built with a clean decoupled microservices layout:
- Frontend serves only static HTML, CSS, and JS β completely stateless.
- Backend exposes only API endpoints β rate-limited, input-validated, and with safe directory handling.
No API keys ever reach the browser. Your Gemini key stays on the server only.
Below is a visual representation of how Ultron connects its multimodal components, backend services, and the Gemini API.
graph TD
subgraph "Client Layer (Web Browser)"
UI["UI (HTML/CSS/JS)"]
Camera["MediaDevices API (Vision)"]
Audio["Web Speech API (STT/TTS)"]
end
subgraph "Server Layer (Google Cloud Run)"
API["Express Router"]
ChatCtrl["Chat Controller"]
AISvc["AI Service (Gemini Wrapper)"]
HistSvc["History Service"]
CmdSvc["Commands Service"]
end
subgraph "External AI Layer"
Gemini["Google Gemini 2.0 Flash"]
end
subgraph "Data Storage Layer (Persistence)"
History["Chat History (JSON)"]
Keywords["Custom Keywords (JSON)"]
Uploads["File Uploads (/uploads)"]
end
%% Interactions
UI <--> API
Camera --> UI
Audio <--> UI
API <--> ChatCtrl
ChatCtrl <--> AISvc
ChatCtrl <--> HistSvc
ChatCtrl <--> CmdSvc
AISvc <--> Gemini
HistSvc <--> History
CmdSvc <--> Keywords
ChatCtrl --> Uploads
Ultron is optimized for modern CI/CD workflows. We have included an automated deployment script that handles the entire pipeline:
- Building the Docker image using Google Cloud Build.
- Pushing the image to GCR (Google Container Registry).
- Deploying the container to Google Cloud Run with the correct port and permission settings.
Run the automated deployment:
# Make the script executable
chmod +x deploy.sh
# Run the deployment
./deploy.shView the automation script here: deploy.sh
Live-agent-Ultron/
βββ frontend/ # Static UI (HTML, CSS, JS modules)
βββ backend/ # API server (Node.js + Gemini)
βββ .env # Your API key goes here
βββ Dockerfile # For Cloud Run deployment
βββ package.json # production-ready scripts
βββ start.js # Concurrent launcher (for local development)
- Go to Google AI Studio
- Sign in with your Google account
- Click "Create API Key"
- Copy the key and paste it into your
.envfile
The free tier is generous enough for personal use.
| Login Page | Chat Interface |
|---|---|
![]() |
![]() |
- AI: Gemini 2.0 Flash (
@google/generative-ai) - Live URL: https://ultron-466211792342.us-central1.run.app/
- Technology Stack: Node.js, Express, Google Gemini 2.0 Flash, Vanilla JS, CSS3.
- Platform: Google Cloud Run
- Configuration: Standardized Docker environment with dynamic port mapping.
Built with β€οΈ using Node.js, Google Gemini, and Vanilla JS.

