SteadyHands.AI — Devpost Submission

Inspiration

SteadyHands.AI was built to help the senior citizens who are less comfortable with technology. Many struggle with web tasks like finding forms, filling them out, or navigating government sites. We wanted an AI assistant that could understand pages, suggest actions, and perform tasks in the browser on behalf of the user, Simplifying the entire internet browsing experience in a new, novel electron based app browser framework.


What it does

SteadyHands.AI is a desktop accessibility assistant that has:

  1. Side-by-Side Interface: It provides a split-screen desktop app where users browse the web normally on one side, while an AI assistant sits on the other to offer simplified page summaries and clear next steps.

  2. Shared Control: Users can navigate the web themselves, or hand over the reins and let the AI handle the clicking, typing, and page navigation.

  3. Goal-Oriented Browsing: Users can state what they want to accomplish in plain English (like "Find the senior tax return PDF"), and the AI will autonomously plan and execute the necessary steps to get it done.

  4. Built-in Safety: The system includes automatic guardrails that pause and ask for user permission before completing sensitive actions, such as checking out or making a payment.

  5. Voice Accessibility: Users can speak their requests out loud instead of typing, and the assistant can read its findings and answers back to them.


How we built it

Tech stack:

  • Desktop: Electron 37 + Vite (electron-vite)
  • Frontend: React 19 + TypeScript + Tailwind CSS 4
  • Agent: LangGraph (LangChain) for the Observe → Plan → Act → Verify loop
  • LLM: Qwen3-30B-A3B VLLM for planner, summarizer, intent inference, and safety
  • Text to Speech: ElevenLabs
  • Speech to Text: OpenAI - Whisper Large V3 Turbo
  • MCP: Model Context Protocol for external tools

Architecture:

  • Agent process: The UI requests observe, act, goBack, and askUser operations via IPC.
  • LangGraph orchestration: A state machine with planNode, executeNode, and checkControls handles the agent loop. It includes stuck detection, go-back logic, and context cosmpaction.
  • Content extraction: An injected script extracts interactive elements (links, buttons, inputs).
  • Action execution: The executor performs click, type, select, scroll, and navigate actions, including semantic matching by tag, role, text, and aria-label.
  • LLM responsibilities: Intent inference, page summarization, semantic interpretation, planAction, safetySupervisor, and resolveUserChoiceToIndex.
  • Model Deployment: VLLM Qwen3-30B-A3B on AMD GPU MI300x 192GB

Challenges Overcome

  1. Stuck Loops — The agent occasionally looped on the same page. We implemented fingerprint-based stuck detection and "go-back" logic to break these cycles.

  2. UI & Window Management — To provide a unified interface for senior users, we moved from a standard web app to Electron. This allowed us to control the browser DOM directly within a single application window.

  3. Multi-Agent Orchestration — Simple reasoning was insufficient for complex tasks. We implemented a tri-agent solution where separate agents Observe, Think, and Act in coordination.

  4. Learning Curve (LangGraph) — With no prior experience, we underwent an accelerated learning phase to integrate LangGraph as our primary orchestration framework.

  5. Cross-Platform Deployment — We bypassed local-only implementations like Playwright or Stagehand to ensure the solution worked consistently across every platform.

  6. High-Throughput LLMs — The "Observe-Think-Act" cycle requires a high volume of requests. We opted for vLLM on AMD Droplets to avoid the low API limits of standard Gemini or GPT models.


Accomplishments that we're proud of

  • Accessibility-focused UI — For senior citizens, the UI is intuitive and easy to navigate.

  • Stable structured outputs — AMD vLLM JSON schema mode.

  • Human-in-the-loop — Optional approval for risky actions (payments, checkouts) and semantic choice resolution so users can answer in natural language.

  • MCP integration — Optional Model Context Protocol support for external tools like Exa search, with room to add calendar, email, and more (planned)

  • Semantic fallback — When the DOM changes and element IDs break, the action executor falls back to matching by tag, role, text, and aria-label so actions still succeed.


What we learned

  • Designing with Empathy: Prioritizing our senior users meant abandoning standard web apps for a unified desktop experience. We learned that the architecture must adapt to the user's comfort, not the other way around.

  • Divide and Conquer with AI: Relying on a single AI to process everything is inefficient. We learned that delegating tasks into specialized roles (observing, thinking, acting) creates a significantly more reliable and capable system.

  • Building Autonomous Guardrails: AI agents can easily get stuck in repetitive loops. We realized that autonomous systems require built-in self-awareness and fail-safes to recognize mistakes and correct their own course.

  • The Necessity of Adaptability: Building a cross-platform, agent-driven tool required stepping out of our comfort zone. We learned how to quickly adopt new orchestration concepts to manage complex project logic from scratch.

  • The Real Cost of Scaling: Continuous AI operations demand massive request volumes. We learned that standard, rate-limited APIs are unsustainable for heavy workloads, making custom, high-performance infrastructure a strict requirement.


What's next for SteadyHands.AI

  • Broader MCP tool support — Calendar, email, and other external integrations.

  • Improved voice UX — Wake word, continuous listening, and better transcription.

  • Mobile or web companion — Shared sessions so caregivers or family can assist remotely.

  • Stronger safety — More granular controls for risky actions and user consent flows.

  • Performance — Faster planner model, context compaction, and reduced latency.

Built With

Share this project:

Updates