A multi-modal desktop assistant that acts as a true action-oriented operator. It bridges the gap between live conversation, computer vision, local OS management, and physical 3D hardware control using the Google Gemini Native Audio API.
-
Clone the repository:
git clone https://github.com/ks9128/aria.git cd aria -
Environment Setup: Create a
.envfile in the root folder with your API key:GEMINI_API_KEY=your_google_api_key_here
-
Start the Backend (Python 3.11):
pip install -r requirements.txt python backend/server.py
-
Start the Frontend (Node/React):
npm install npm run dev
Built with 🤖 by Khalid Saifullah