Inspiration
Telehealth portals have become essential for modern healthcare, but they are often confusing and difficult to navigate, especially for elderly patients. Many seniors struggle with logging in, finding appointments, completing check-in steps, and joining video visits. These barriers can prevent patients from attending important medical appointments.
SilverVisit AI was inspired by the idea that AI should not just answer questions, it should actively help people navigate technology. We wanted to build an intelligent assistant that can understand a user's goal and visually guide them through complex healthcare portals step-by-step.
Our vision was simple:
If a senior can say “Help me join my appointment,” the AI should take care of the rest.
What It Does
SilverVisit AI is a multimodal UI navigation agent that helps users safely navigate telehealth portals.
Instead of forcing users to learn complicated interfaces, the user simply speaks or types a goal such as:
“Help me attend my 3 PM appointment.”
The AI then:
- Observes the screen using visual context
- Understands the user’s goal
- Plans the next safe UI action
- Executes one grounded step at a time
Examples of tasks it performs:
- Logging into telehealth portals
- Opening upcoming appointments
- Completing echeck-in steps
- Handling device setup
- Guiding users to the waiting room
- Helping them join their doctor’s appointment
This transforms complex healthcare interfaces into simple goal based interactions.
How We Built It
SilverVisit AI combines multimodal reasoning, UI grounding, and cloud-based AI services.
The system has three main components:
1. Browser Extension (UI Navigator)
A Chrome extension observes the telehealth portal and executes safe UI actions such as:
- clicking buttons
- typing credentials
- scrolling pages
- navigating between sections
The extension communicates with the backend planner to determine the next step.
2. Gemini-Powered Planning Backend
The backend runs on Google Cloud Run and uses Gemini models via the Google GenAI SDK to reason about user goals and UI state.
The planner:
- receives screenshots and user intent
- interprets the current page
- determines the safest next action
- ensures only one grounded step per turn
This architecture keeps the system predictable and safe.
3. Telehealth Sandbox Environment
To test realistic scenarios, we built a deterministic telehealth sandbox portal that simulates:
- login pages
- appointment dashboards
- check-in workflows
- device setup
- waiting rooms
- video visit joining
This allows the agent to demonstrate real navigation workflows end-to-end.
Google Cloud Architecture
SilverVisit AI leverages multiple Google Cloud technologies:
- Gemini models via the Google GenAI SDK
- Google Cloud Run to host the backend AI planner
- Vertex AI configuration for Gemini models
- Firestore-backed session and event storage
- Live agent streaming via Gemini Live
This architecture enables a scalable AI system capable of reasoning about real interfaces.
Challenges We Ran Into
Building a safe UI navigation agent presented several challenges:
Goal persistence
Ensuring the AI maintains the user's original goal across multiple UI steps was complex. Intermediate steps like login or check-in should not prematurely end the task.
Preventing navigation loops
The agent sometimes restarted flows instead of continuing within the current screen. We implemented sub-flow awareness to prioritize downstream actions.
Safe UI grounding
To prevent unsafe automation, the system only executes actions when elements are verified as visible and valid.
Real-time agent interaction
Integrating Gemini Live required careful handling of event streams and session management.
What We Learned
This project taught us that multimodal AI agents require strong guardrails.
Reliable UI navigation requires:
- deterministic grounding
- goal persistence
- safe action validation
- careful state management
By combining Gemini’s reasoning capabilities with structured UI actions, we created an AI system that can interact with real software interfaces.
What's Next
SilverVisit AI demonstrates the potential of AI powered accessibility assistants.
Future directions include:
- supporting real telehealth portals
- integrating voice first interaction for seniors
- expanding to other complex digital services like banking and government portals
Our long term vision is a world where technology adapts to people not the other way around.
Built With
- chromeextensionapi
- gemini2.5flash
- geminiliveapi
- googlecloudfirestore
- googlecloudrun
- googlegenaisdk
- node.js
- react
- restapi
- typescript
- vertexai
- vite
- websockets

Log in or sign up for Devpost to join the conversation.