Skip to content

Uncle-Noon/Live-agent-Ultron

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

89 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

⚑ Ultron β€” Your AI Personal Assistant

πŸ“Ί Watch Demo Video | πŸ† Hackathon Submission


Ultron is a sleek, fast, and powerful AI assistant that lives in your browser. Talk to it, give it commands, and it gets things done β€” opening websites, answering questions, reading files, and more. Built on Google Gemini, it's smart, responsive, and completely yours to run locally.


πŸš€ Quick Start

Prerequisites: You need Node.js installed and a free Gemini API key.

1. Clone the repository

git clone <your-repo-url>
cd Live-agent-Ultron

2. Add your API key

Open the .env file in the root folder and paste your key:

GEMINI_API_KEY=your_key_here

3. Install dependencies

npm install

4. Launch the app

npm run dev

5. Open in your browser

Ctrl + Click the link in the terminal, or go to http://localhost:3000.

That's it. You're in.


πŸ§ͺ Reproducible Testing

Once the app is running, follow these steps to verify its core capabilities:

1. Basic AI Chat

  • Action: Type "Hello, who are you?" in the chat box and press Enter.
  • Expectation: Ultron should reply word-by-word (streaming) and explain its role as your assistant.

2. Intelligent Command Execution

  • Action: Type "Open the place where I watch videos" and press Enter.
  • Expectation: A new tab should open to YouTube.com. Ultron should confirm the action in the chat.

3. Custom Keyword Creation

  • Action: Go to the Commands section (top right sidebar).
  • Action: Add Keyword: hamburger, URL: instagram.com, Label: Instagram.
  • Action: Go back to chat and type "hamburger".
  • Expectation: Instagram should open instantly without a delay, bypassing the AI intent mapping.

4. Live Vision (Multimodal)

  • Action: Click the Camera icon.
  • Action: Hold an object (like a phone or a pen) in front of your webcam and click the blue shutter button.
  • Expectation: Ultron will analyze the "Vision" snapshot and describe exactly what it sees.

5. File Analysis

  • Action: Click the Paperclip icon and upload a small .txt or .js file.
  • Action: Send a message like "Summarize this file".
  • Expectation: Ultron will read the file content and provide a summary.

6. Voice Interaction

  • Action: Click the Microphone icon (allow browser permissions).
  • Action: Speak a message, wait for it to transcribe, and click send.
  • Action: Ensure your volume is up to hear Ultron's Voice Output.

7. Chat Memory

  • Action: Type "My name is [name] and I am [age] years old" and press Enter.
  • Action: Then later type "What is my name and how old am I?" and press Enter.
  • Expectation: Ultron will go through the chat history and give your name and age.

✨ Features

πŸ€– AI Chat

Talk to Ultron just like you'd talk to a person. It understands natural language, answers questions, helps you write, explains concepts, and much more β€” all powered by Google Gemini.

🌐 Open Websites by Voice or Text

Tell Ultron to open a website and it'll do it instantly. Say things like:

  • "Open YouTube"
  • "Open the place where I watch reels"
  • "Please open my Instagram"

Ultron understands what you mean, not just what you say.

πŸ”– Custom Keywords

Create your own personal shortcuts. Map any word to any website β€” for example, set "Hamburger" to open Instagram. Once saved, just type or say the keyword and Ultron opens it immediately, no AI call needed.

πŸ—‚οΈ File Upload & Analysis

Attach a file to your message and Ultron will read and analyze it for you. Ask questions about its contents, get summaries, or extract specific information β€” all without leaving the chat.

πŸ“· Live Vision ("See" Mode)

Ultron can now see! Click the camera icon to take a real-time snapshot. Point your camera at anything, and Ultron will analyze the image and describe it to you using its multimodal vision capabilities.

πŸŽ™οΈ Voice Input (Speech-to-Text)

Click the microphone button and speak your message. Ultron transcribes your speech into the input box so you can review it before sending, or fire it off hands-free.

πŸ”Š Voice Output (Text-to-Speech)

Ultron reads its responses aloud using your browser's built-in text-to-speech. Hear the reply while you multitask β€” only the actual response is read, never any control commands.

πŸ’¬ Streaming Responses

Ultron types its response in real time, word by word, just like a person typing β€” so you never stare at a blank screen waiting. A "Thinking..." animation appears the moment you send a message so you always know it's working.

πŸ“‹ Copy Any Message

Hover over any message β€” yours or Ultron's β€” and a clipboard icon appears in the corner. Click it to copy the text instantly. The icon turns green to confirm the copy.

⌨️ Keyboard-First Design

Everything works without lifting your hands from the keyboard:

  • Enter sends your message in the chat
  • Enter in the Keyword field moves you to the URL field
  • Enter in the URL field saves the custom keyword
  • Enter on the login page signs you in

πŸ•“ Per-User Chat History

Each user gets their own private chat history, saved automatically. Come back later and your conversation is right where you left it. Clear it any time with a single click.

πŸ‘€ Multi-User Login

Ultron supports multiple users on the same machine. Each person's history, custom keywords, and settings are completely separate from everyone else's. Just enter your email to switch accounts.

πŸ›‘οΈ Security & Architecture

Built with a clean decoupled microservices layout:

  • Frontend serves only static HTML, CSS, and JS β€” completely stateless.
  • Backend exposes only API endpoints β€” rate-limited, input-validated, and with safe directory handling.

No API keys ever reach the browser. Your Gemini key stays on the server only.


πŸ—οΈ System Architecture

Below is a visual representation of how Ultron connects its multimodal components, backend services, and the Gemini API.

graph TD
    subgraph "Client Layer (Web Browser)"
        UI["UI (HTML/CSS/JS)"]
        Camera["MediaDevices API (Vision)"]
        Audio["Web Speech API (STT/TTS)"]
    end

    subgraph "Server Layer (Google Cloud Run)"
        API["Express Router"]
        ChatCtrl["Chat Controller"]
        AISvc["AI Service (Gemini Wrapper)"]
        HistSvc["History Service"]
        CmdSvc["Commands Service"]
    end

    subgraph "External AI Layer"
        Gemini["Google Gemini 2.0 Flash"]
    end

    subgraph "Data Storage Layer (Persistence)"
        History["Chat History (JSON)"]
        Keywords["Custom Keywords (JSON)"]
        Uploads["File Uploads (/uploads)"]
    end

    %% Interactions
    UI <--> API
    Camera --> UI
    Audio <--> UI
    
    API <--> ChatCtrl
    ChatCtrl <--> AISvc
    ChatCtrl <--> HistSvc
    ChatCtrl <--> CmdSvc
    
    AISvc <--> Gemini
    
    HistSvc <--> History
    CmdSvc <--> Keywords
    ChatCtrl --> Uploads
Loading

πŸš€ Automated Deployment

Ultron is optimized for modern CI/CD workflows. We have included an automated deployment script that handles the entire pipeline:

  1. Building the Docker image using Google Cloud Build.
  2. Pushing the image to GCR (Google Container Registry).
  3. Deploying the container to Google Cloud Run with the correct port and permission settings.

Run the automated deployment:

# Make the script executable
chmod +x deploy.sh

# Run the deployment
./deploy.sh

View the automation script here: deploy.sh


πŸ“ Project Structure

Live-agent-Ultron/
β”œβ”€β”€ frontend/          # Static UI (HTML, CSS, JS modules)
β”œβ”€β”€ backend/           # API server (Node.js + Gemini)
β”œβ”€β”€ .env               # Your API key goes here
β”œβ”€β”€ Dockerfile         # For Cloud Run deployment
β”œβ”€β”€ package.json       # production-ready scripts
└── start.js           # Concurrent launcher (for local development)

πŸ”‘ Getting a Gemini API Key

  1. Go to Google AI Studio
  2. Sign in with your Google account
  3. Click "Create API Key"
  4. Copy the key and paste it into your .env file

The free tier is generous enough for personal use.

πŸ“± App Interface

Login Page Chat Interface
Login Page Chat Interface

πŸ› οΈ Technical Stack

☁️ Deployment

  • Platform: Google Cloud Run
  • Configuration: Standardized Docker environment with dynamic port mapping.

Built with ❀️ using Node.js, Google Gemini, and Vanilla JS.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors