ARIA | Devpost

The Front End (guardian)
End to end flow
AgenticLLM Tool Calling loop
LLM Router Logic

Inspiration

55 million people worldwide live with dementia. Millions more are visually impaired or elderly and living alone. Their families face an impossible choice, constant worry from a distance, or giving up their own lives to provide round-the-clock care. Existing solutions are either passive (a medical alert button that requires the patient to press it) or clinical (expensive facility care that strips away independence and dignity).

We tried to build something different. An AI companion that doesn't wait for the patient to ask for help, but proactively watches over them with warmth and empathy, while giving families a real-time window into their loved one's day.

What it does

ARIA: Assistive Real-time Intelligent Aid

ARIA is a wearable AI assistant built on the Arduino Uno Q that acts as a second brain for people who need one most.

For the patient, ARIA is a warm, always-present voice that:

Sees: recognizes family members by face, reminds the patients about relations, describes surroundings ("You left your glasses on the kitchen countertop"), reads medicine labels and signs aloud etc.
Listens: responds to natural voice queries via wake word ("Hey Arduino, what time is it?"), understands context from ongoing conversation
Protects: detects falls using accelerometer data, immediately asks "Are you okay?", listens for the response, and escalates to the family only if the patient says "help" or stays silent

For the family, ARIA provides a companion mobile app that:

Shows real-time patient status: what their loved one is doing right now, with an ambient state visualization
Displays a live activity timeline: every event (conversations, face sightings, scene captures, alerts) in chronological order with full conversation transcripts
Offers live camera access: — see what the device camera sees on demand, with an AI-generated scene description
Enables two-way communication: type a message or tap a preset, and it's spoken aloud to the patient through the device
Triggers instant emergency alerts: falls and distress escalate to a full-screen modal with options to call emergency services or dispatch help
Manages the patient profile: medications, loved ones (with photos for face recognition), notes, and conditions all sync bidirectionally between app and device

How we built it

Device (Arduino Uno Q — Python)

The core is an agentic LLM architecture — a central CareAgent receives events from every subsystem (falls, faces, voice, timers) and drives an OpenAI function-calling loop to decide which tools to invoke. The agent has access to 10+ tools: speak_to_user, describe_scene, find_object, read_text, identify_person, send_family_alert, navigate_room, set_reminder, and get_current_datetime.

Triple-LLM fallback: OpenAI GPT-4o-mini (primary) → Together AI Llama-3.3-70B (when rate-limited) → on-device SmolVLM 500M via llama.cpp (offline). The router automatically detects rate limits and backs off with timestamps.
3-tier memory system: Working memory (last 10 messages verbatim), episodic memory (LLM-compressed summaries of older conversations), and a semantic profile (patient info, family, home layout, medications).
Computer vision: OpenCV YuNet for face detection + SFace for face recognition, enabling multi-profile family member identification. Vision-language models (Together AI Qwen3-VL or local SmolVLM) for scene description, object finding, OCR, and safety analysis.
Audio: LMNT for high-quality text-to-speech, OpenAI Whisper for speech-to-text, Arduino Speaker peripheral for playback, KeywordSpotting brick for always-on wake word detection.
Fall detection: Two-phase accelerometer analysis (free-fall magnitude collapse → impact spike within 500ms window), with a post-fall dialogue system that records the patient's response and evaluates it for distress keywords.
Proactive care: Time-aware scheduler for medication reminders, meal suggestions, orientation prompts, and inactivity alerts with speech throttling (2-minute cooldown, 3-minute global proactive cooldown) to avoid overwhelming the patient.

Family App (React Native / Expo)

Built with Expo 55, expo-router (file-based routing), react-native-reanimated for smooth animations, and NativeWind for styling.
Polls the device REST API every 3-10 seconds for live status, timeline events, and alerts.
Groups conversational events (wake word → query → response) into tappable conversation cards with chat-bubble UI.
Camera captures from the device are stored as JPEGs and served via base64 API, displayed in timeline entries and the live camera modal.
Emergency system with context-aware severity filtering — technical errors (LLM rate limits, vision failures) are suppressed; only real patient events surface.

Integration: The device runs a FastAPI-based WebUI exposing 16 REST endpoints. The app connects directly on native (Expo Go) or through a Node.js during web development.

Challenges we ran into

Audio on embedded Linux: The Arduino Uno Q's ALSA configuration was non-trivial. pygame.mixer failed, aplay had device conflicts. We eventually discovered the arduino.app_peripherals.speaker.Speaker class that streams raw PCM samples directly — but had to convert LMNT WAV output to the right sample format first.
LLM rate limits at the worst times: OpenAI's free tier rate-limits hit during critical moments (fall detection). We built the triple-fallback system with automatic 25-second backoff and seamless failover to Together AI (or when no internet, a local LLM!), ensuring the device never goes silent during an emergency.
Memory vs. context window: Dementia patients need conversational continuity, but small LLMs can't hold infinite context. The 3-tier memory system with automatic episodic compression was our solution, older conversations are summarized into 2-3 sentences and prepended to the system prompt.

Accomplishments that we're proud of

The fall detection dialogue: The device doesn't just detect a fall and fire an alert. It asks "Are you okay?", records 5 seconds of audio, transcribes it, checks for distress keywords ("help", "hurt", "can't"), and makes a judgment call. No other consumer device does this.
The agent actually works as an agent: The LLM genuinely decides which tools to use. Ask "what's in front of me?" and it calls describe_scene. Ask "where's my mug?" and it calls find_object. Say "remind me in 10 minutes" and it calls set_reminder. It's not scripted — it's emergent behavior from good system prompting and function definitions.
Triple-LLM resilience: The device seamlessly fails over from cloud to edge. During our demo, OpenAI rate-limited mid-conversation and the system transparently switched to Together AI without the patient noticing any interruption.

What we learned

Edge AI is about trade-offs, not limitations: Running a 500M parameter VLM on an Arduino teaches you that the right model at the right time matters more than the biggest model all the time. Our local LLM handles basic queries fine; we only need the cloud for complex reasoning.
Dementia care is about emotional truth, not factual accuracy: Speaking with healthcare professionals and reading about validation therapy fundamentally changed how we designed the AI's personality.
Hardware integration is 80% of the work: The AI/ML parts were straightforward compared to getting two microphones, a speaker, a camera, and an accelerometer to work simultaneously on embedded Linux without stepping on each other.

What's next for ARIA

Emotion detection: Using vocal tone analysis and facial expression recognition to detect agitation, sadness, or confusion before the patient verbalizes it
Multi-device mesh: Multiple ARIA devices throughout the home, tracking which room the patient is in and providing location-aware guidance
Caregiver dashboard: A web portal for professional caregivers managing multiple patients, with trend analytics and behavioral pattern detection
Medication verification: Using the camera to visually confirm the patient has taken their pills (pill detection + hand tracking)
Multilingual support: Many elderly patients are more comfortable in their native language,supporting seamless language switching mid-conversation

Built With

arduino
expo.io
fastapi
lmnt
opencv
react
smolvlm

Updates

Jayavibhav Niranjan Kogundi started this project — Mar 01, 2026 10:29 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.