Inspiration

Telehealth portals have become essential for modern healthcare, but they are often confusing and difficult to navigate, especially for elderly patients. Many seniors struggle with logging in, finding appointments, completing check-in steps, and joining video visits. These barriers can prevent patients from attending important medical appointments.

SilverVisit AI was inspired by the idea that AI should not just answer questions, it should actively help people navigate technology. We wanted to build an intelligent assistant that can understand a user's goal and visually guide them through complex healthcare portals step-by-step.

Our vision was simple:
If a senior can say “Help me join my appointment,” the AI should take care of the rest.


What It Does

SilverVisit AI is a multimodal UI navigation agent that helps users safely navigate telehealth portals.

Instead of forcing users to learn complicated interfaces, the user simply speaks or types a goal such as:

“Help me attend my 3 PM appointment.”

The AI then:

  1. Observes the screen using visual context
  2. Understands the user’s goal
  3. Plans the next safe UI action
  4. Executes one grounded step at a time

Examples of tasks it performs:

  • Logging into telehealth portals
  • Opening upcoming appointments
  • Completing echeck-in steps
  • Handling device setup
  • Guiding users to the waiting room
  • Helping them join their doctor’s appointment

This transforms complex healthcare interfaces into simple goal based interactions.


How We Built It

SilverVisit AI combines multimodal reasoning, UI grounding, and cloud-based AI services.

The system has three main components:

1. Browser Extension (UI Navigator)

A Chrome extension observes the telehealth portal and executes safe UI actions such as:

  • clicking buttons
  • typing credentials
  • scrolling pages
  • navigating between sections

The extension communicates with the backend planner to determine the next step.


2. Gemini-Powered Planning Backend

The backend runs on Google Cloud Run and uses Gemini models via the Google GenAI SDK to reason about user goals and UI state.

The planner:

  • receives screenshots and user intent
  • interprets the current page
  • determines the safest next action
  • ensures only one grounded step per turn

This architecture keeps the system predictable and safe.


3. Telehealth Sandbox Environment

To test realistic scenarios, we built a deterministic telehealth sandbox portal that simulates:

  • login pages
  • appointment dashboards
  • check-in workflows
  • device setup
  • waiting rooms
  • video visit joining

This allows the agent to demonstrate real navigation workflows end-to-end.


Google Cloud Architecture

SilverVisit AI leverages multiple Google Cloud technologies:

  • Gemini models via the Google GenAI SDK
  • Google Cloud Run to host the backend AI planner
  • Vertex AI configuration for Gemini models
  • Firestore-backed session and event storage
  • Live agent streaming via Gemini Live

This architecture enables a scalable AI system capable of reasoning about real interfaces.


Challenges We Ran Into

Building a safe UI navigation agent presented several challenges:

Goal persistence

Ensuring the AI maintains the user's original goal across multiple UI steps was complex. Intermediate steps like login or check-in should not prematurely end the task.

Preventing navigation loops

The agent sometimes restarted flows instead of continuing within the current screen. We implemented sub-flow awareness to prioritize downstream actions.

Safe UI grounding

To prevent unsafe automation, the system only executes actions when elements are verified as visible and valid.

Real-time agent interaction

Integrating Gemini Live required careful handling of event streams and session management.


What We Learned

This project taught us that multimodal AI agents require strong guardrails.

Reliable UI navigation requires:

  • deterministic grounding
  • goal persistence
  • safe action validation
  • careful state management

By combining Gemini’s reasoning capabilities with structured UI actions, we created an AI system that can interact with real software interfaces.


What's Next

SilverVisit AI demonstrates the potential of AI powered accessibility assistants.

Future directions include:

  • supporting real telehealth portals
  • integrating voice first interaction for seniors
  • expanding to other complex digital services like banking and government portals

Our long term vision is a world where technology adapts to people not the other way around.

Built With

Share this project:

Updates