Jarvis Voice Agent

Jarvis is a voice-first, native macOS computer assistant. It listens for a wake word, intelligently processes your requests via Google Cloud, and executes actions directly on your local machine using native macOS APIs.

Unlike typical chatbots, Jarvis acts as a true OS-level copilot—it can play music on your local Spotify app, schedule meetings in your Apple Calendar, adjust system settings, manage files, and control your browser.

Architecture

The project is split into a Thin Desktop Client and a Cloud Agent Brain to meet hackathon requirements for cloud execution while preserving local machine control.

1. Cloud Run Backend (The "Brain")

Hosted on: Google Cloud Run.
LLM Engine: Vertex AI (gemini-2.5-flash).
Functionality: Maintains session memory, handles multi-turn conversations, parses user transcripts, and intelligently decides which native tools the desktop client needs to execute (via FunctionDeclarations).
Security: Enforces authenticated endpoints and structured schema validation.

2. Desktop Client (The "Actuator")

Running on: macOS (Node.js/TypeScript).
Wake Word: Picovoice Porcupine (listening for "Jarvis").
Speech-to-Text (STT): Native Apple Speech (dictation) for fast, local transcription.
Text-to-Speech (TTS): ElevenLabs API for high-quality, expressive voice responses.
Execution UI: Custom macOS native frosted-glass Swift overlays (jarvis-overlay and jarvis-context-panel).

Native Capabilities (Tools)

Jarvis bridges the LLM with your computer using native macOS automation (JavaScript for Automation/JXA, AppleScript, and OS APIs):

Spotify Control: Can search for tracks and intelligently play them in the native macOS Spotify app without requiring the Spotify API.
Apple Calendar: Creates events directly in your system Calendar via JXA.
System Volume: Adjusts the OS master volume via AppleScript.
Filesystem Management: Can create folders, rename files, move items, trash files, and search using mdfind (Spotlight).
Browser Automation: Can open URLs, search the web, and interact with the active browser.
Accessibility & UI: Can focus apps, click macOS menu bar items, type text into the active window, and press keyboard shortcuts.

Getting Started

Prerequisites

macOS (Intel or Apple Silicon).
Node.js (v20+ recommended).
A Google Cloud Project with Vertex AI and Cloud Run enabled.
An ElevenLabs API Key.

1. Backend Setup (Google Cloud)

Deploy the backend directory to Google Cloud Run:

gcloud run deploy jarvis-backend --source backend --region us-central1 --allow-unauthenticated

Set the required backend environment variables in Cloud Run:
- GOOGLE_CLOUD_PROJECT
- AURA_GEMINI_MODEL (e.g., gemini-2.5-flash)
- AURA_BACKEND_AUTH_TOKEN (Create a secure token for client auth).

2. Desktop Setup (Local macOS)

Clone the repository and install dependencies:
```
npm install
```

Configure your desktop .env file (see .env.example if applicable, or create one in the root):

ELEVENLABS_API_KEY=your_elevenlabs_key
ELEVENLABS_VOICE_ID=your_voice_id

AURA_BACKEND_URL=https://your-cloud-run-url.run.app
AURA_BACKEND_AUTH_TOKEN=the_secure_token_from_backend

# Note: GEMINI_API_KEY is not needed locally because the brain is in the cloud.

(Optional) Recompile the Swift UI overlays if you make UI changes:

swiftc desktop/src/swift/JarvisOverlay.swift -o desktop/assets/jarvis-overlay

🎙️ Running Jarvis

Start the voice agent:

npm -w desktop run voice

Wait for the 🌟 Jarvis Voice Agent boot sequence to finish.
Say the wake word: "Jarvis".
You will hear an activation chime. Speak your command (e.g., "Set a meeting for tomorrow at 2 PM" or "Play some jazz on Spotify").
Jarvis will pause, transcribe, think via Cloud Run, execute the action locally, and respond with a voice confirmation and visual overlay.

Safety & Privacy

Push-to-Talk / Wake Word: Jarvis only records audio when explicitly summoned. Audio recording stops automatically when you stop speaking.
Cloud Isolation: The Cloud Run backend receives only text transcripts, not raw audio or sensitive local files.
Destructive Actions: Actions like trashing files or moving data require explicit confirmation.

Built for the Google Cloud Vertex AI Hackathon.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
Hey-Aura_en_mac_v4_0_0		Hey-Aura_en_mac_v4_0_0
backend		backend
companion		companion
desktop		desktop
docs		docs
scripts		scripts
.gitignore		.gitignore
ABOUT_THE_PROJECT.md		ABOUT_THE_PROJECT.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jarvis Voice Agent

Architecture

1. Cloud Run Backend (The "Brain")

2. Desktop Client (The "Actuator")

Native Capabilities (Tools)

Getting Started

Prerequisites

1. Backend Setup (Google Cloud)

2. Desktop Setup (Local macOS)

🎙️ Running Jarvis

Safety & Privacy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Jarvis Voice Agent

Architecture

1. Cloud Run Backend (The "Brain")

2. Desktop Client (The "Actuator")

Native Capabilities (Tools)

Getting Started

Prerequisites

1. Backend Setup (Google Cloud)

2. Desktop Setup (Local macOS)

🎙️ Running Jarvis

Safety & Privacy

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages