Inspiration

Many people often struggle to use computers because of arthritis, tremors, and small on-screen targets. They rely on family for basic tasks and feel frustrated or embarrassed. We set out to give this group a hands-free way to understand and complete everyday actions on a computer.

What it does

Aria is an AI agent that translates voice commands into actions on your computer. It lets users speak naturally to their machine and have it carry out tasks. These tasks include opening apps, writing and sending emails, searching and playing videos, print, organizing files, adjusting text size, and reading pages aloud. Users talk to Aria the way they would ask a family member for help, and Aria explains the plan and then completes the task.

How we built it

We built the front end in Electron.js with large targets, clear focus states, an in-flow transcript, and an audio bar that never covers content. The back end runs on Python FastAPI, which maps speech to intents and executes multi-step task graphs with confirmations and undo. We use the Gemini API for intent understanding and summaries, ElevenLabs for natural text-to-speech, and Cloudflare for secure, low-latency routing. This stack lets us keep the interface simple while coordinating reliable actions behind the scenes.

Challenges we ran into

Integrating the user interface, speech recognition, text-to-speech, and the backend action system was challenging. Once each piece was working, combining them revealed issues like dependency errors which had to be worked through.

Accomplishments that we're proud of

We demonstrated full hands free flows like searching for different files, finding plane tickets, and opening up websites on a browser. We also had a fully functional and smooth UI.

What's next for Aria

In the future, we plan on making Aria faster by using quicker models, streaming speech recognition and text-to-speech, and light on-device processing so older adults are not left waiting. We will add simple customization so users and caregivers can choose voice, speaking speed, and how much explanation Aria provides. We will expand beyond desktop to mobile and tablets, and improve accessibility defaults with larger presets, clearer focus cues, and easy confirmation phrases. We will also strengthen the consented handoff to a trusted person and personalize recognition for seniors’ accents, pacing, and common vocabulary such as doctor names and medications.

Built With

Share this project:

Updates