Skip to content

Feature: Web UI Gateway — Local Browser-Based Interface with Streaming, Artifacts & Rich Rendering #501

@teknium1

Description

@teknium1

Overview

Hermes Agent currently supports CLI, Telegram, Discord, WhatsApp, Slack, and Home Assistant as interaction modes. The one glaring gap is a local web-based UI — a browser interface that users can run alongside or instead of the CLI. Every major competitor has one: Claude has Artifacts and the web app, ChatGPT has Canvas and Projects, Open WebUI provides a self-hosted chat interface, Aider has a browser mode, and Open Interpreter has a web UI.

A web UI would serve multiple purposes: (1) a richer visual experience than the terminal, (2) a platform for features that don't map well to chat (artifacts, canvas, file trees, diffs, dashboards), and (3) a gateway for users who don't use messaging platforms but want something more capable than a terminal.

This research was informed by studying 30+ AI interface projects including AG-UI Protocol, CopilotKit, Claude Artifacts, ChatGPT Canvas, tldraw Computer, Open WebUI, Jan.ai, LibreChat, and Magentic-UI.


Research Findings

The Web UI Landscape

The AI agent interface space has matured significantly. Key patterns observed:

AG-UI Protocol (github.com/ag-ui-protocol/ag-ui): An open standard for agent-to-frontend communication with 16 event types for real-time bidirectional state sync. Completes the "agentic protocol stack" alongside MCP (tools) and A2A (agent-to-agent). Already integrated with LangGraph, CrewAI, Google ADK, and others.

Claude Artifacts: Dual-pane architecture with conversation on the left and a workspace on the right. Live rendering of HTML, CSS, React, SVG. Version slider for iterating. Component isolation (multiple distinct assets per chat). This is the gold standard for "rich output beyond text."

ChatGPT Canvas: Inline editing — select text/code, prompt changes for just the selection. Direct manipulation (type directly into Canvas). Coding shortcuts and writing tools. Back button for version history.

Open WebUI: Self-hosted web interface supporting Ollama and OpenAI-compatible APIs. Multi-model, RAG, plugins, Python execution, image generation. Proves the concept of self-hosted chat UI.

CopilotKit: Open-source SDK for building in-app AI copilots. Uses AG-UI protocol. Hooks like useCopilotAction, generative UI playground. AI embedded IN the app, not a separate window.

Magentic-UI (Microsoft): Co-planning (collaboratively create plans), co-tasking (interrupt/guide execution), action guards, "Tell me When" monitoring, plan learning. The most thoughtful human-in-the-loop web UI.

Key Design Decisions in Successful Web UIs

  1. Streaming is mandatory — Users expect to see tokens appear in real-time, not wait for a complete response
  2. Dual-pane > single chat — Separating conversation from workspace/output dramatically improves usability for code, documents, and rich content
  3. Native rendering — Code blocks with syntax highlighting, markdown tables, math rendering, image display, even live HTML previews
  4. Session management — Sidebar with conversation history, search, folders/projects
  5. Tool visibility — Show tool calls happening in real-time with collapsible details (not just a spinner)

Current State in Hermes Agent

Hermes has a sophisticated gateway system (gateway/platforms/base.py) with platform adapters. Adding a web UI would follow the same pattern — a new adapter in gateway/platforms/web.py that implements the PlatformAdapter base class.

Relevant existing components:

  • gateway/run.py — Multi-platform gateway runner, would host the web server alongside other adapters
  • gateway/platforms/base.py — Base adapter class with send_message, send_image, send_audio, etc.
  • gateway/session.py — Session management (SQLite + JSON transcripts)
  • Tool progress system — Already emits structured progress events
  • Image/audio/file handling — Already has MEDIA tag extraction, image URL detection

The web UI would naturally integrate with all of these. The gateway architecture was clearly designed to be extensible.


Implementation Plan

Skill vs. Tool Classification

This should be a core codebase change — a new gateway platform adapter (gateway/platforms/web.py) plus a bundled frontend. It requires real-time WebSocket communication, binary data handling (images, audio, files), and deep integration with the session and tool progress systems. This cannot be expressed as a skill.

What We'd Need

  • WebSocket server (FastAPI or aiohttp, both already available in the ecosystem)
  • Lightweight frontend (React/Preact or vanilla JS with a bundler)
  • New platform adapter implementing PlatformAdapter interface
  • Static file serving for the frontend
  • Session persistence integration

Phased Rollout

Phase 1: Minimal Chat Web UI

  • FastAPI/aiohttp WebSocket server as a new gateway platform adapter
  • Simple single-page chat interface with markdown rendering
  • Streaming token display via WebSocket events
  • Basic session management (new/reset/resume)
  • Syntax-highlighted code blocks
  • Image display (inline from URLs)
  • Tool progress indicators (collapsible)
  • Launch via hermes gateway --web or auto-start on a configurable port
  • Mobile-responsive design

Phase 2: Rich Workspace Features

  • Dual-pane layout: chat (left) + workspace/artifacts (right)
  • Artifact rendering: live HTML/React preview, SVG display, document viewer
  • File tree browser (show agent's working directory)
  • Diff viewer for file changes (before/after with syntax highlighting)
  • Session sidebar with history, search, and resume
  • Slash command palette (Cmd+K style)
  • Drag-and-drop file upload
  • Clipboard paste for images
  • Settings panel (model selection, personality, tool toggles)

Phase 3: Advanced Features

  • Split-view for parallel subagent monitoring (Mission Control style, inspired by Cursor)
  • Canvas mode: spatial arrangement of conversation elements
  • Code execution preview (sandboxed iframe for HTML/JS artifacts)
  • Collaborative sessions (multiple users viewing same agent session)
  • Export conversations (markdown, PDF, JSON)
  • Keyboard shortcuts throughout
  • AG-UI protocol compatibility for third-party frontend integration
  • PWA support for mobile installation

Pros & Cons

Pros

  • Fills the biggest UX gap — every competitor has a web interface
  • Enables rich features impossible in chat (artifacts, canvas, diffs, file trees)
  • No app installation required — works in any browser
  • Mobile-friendly (unlike CLI)
  • Can serve as the "showcase" interface for demos and onboarding
  • Reuses existing gateway architecture
  • Opens the door to AG-UI protocol support, making Hermes usable from any compatible frontend

Cons / Risks

  • Significant frontend development effort (HTML/CSS/JS is outside current Python-focused codebase)
  • Maintenance burden of a second "rich" interface alongside CLI
  • Security considerations: WebSocket auth, CORS, local-only vs exposed
  • Feature parity pressure: users will expect web UI to support everything CLI does
  • Bundle size and dependency management for frontend assets

Open Questions

  • Should the frontend be React-based (richer but heavier) or vanilla JS (lighter, fewer deps)?
  • Should we adopt AG-UI protocol from the start, or build a custom WebSocket protocol first and add AG-UI compatibility later?
  • Should the web UI be opt-in (hermes gateway --web) or always-on when the gateway runs?
  • How do we handle auth for the web UI? Token-based? Local-only binding?
  • Should artifacts be stored persistently (like Claude) or ephemeral per session?

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions