Overview
Hermes Agent currently supports CLI, Telegram, Discord, WhatsApp, Slack, and Home Assistant as interaction modes. The one glaring gap is a local web-based UI — a browser interface that users can run alongside or instead of the CLI. Every major competitor has one: Claude has Artifacts and the web app, ChatGPT has Canvas and Projects, Open WebUI provides a self-hosted chat interface, Aider has a browser mode, and Open Interpreter has a web UI.
A web UI would serve multiple purposes: (1) a richer visual experience than the terminal, (2) a platform for features that don't map well to chat (artifacts, canvas, file trees, diffs, dashboards), and (3) a gateway for users who don't use messaging platforms but want something more capable than a terminal.
This research was informed by studying 30+ AI interface projects including AG-UI Protocol, CopilotKit, Claude Artifacts, ChatGPT Canvas, tldraw Computer, Open WebUI, Jan.ai, LibreChat, and Magentic-UI.
Research Findings
The Web UI Landscape
The AI agent interface space has matured significantly. Key patterns observed:
AG-UI Protocol (github.com/ag-ui-protocol/ag-ui): An open standard for agent-to-frontend communication with 16 event types for real-time bidirectional state sync. Completes the "agentic protocol stack" alongside MCP (tools) and A2A (agent-to-agent). Already integrated with LangGraph, CrewAI, Google ADK, and others.
Claude Artifacts: Dual-pane architecture with conversation on the left and a workspace on the right. Live rendering of HTML, CSS, React, SVG. Version slider for iterating. Component isolation (multiple distinct assets per chat). This is the gold standard for "rich output beyond text."
ChatGPT Canvas: Inline editing — select text/code, prompt changes for just the selection. Direct manipulation (type directly into Canvas). Coding shortcuts and writing tools. Back button for version history.
Open WebUI: Self-hosted web interface supporting Ollama and OpenAI-compatible APIs. Multi-model, RAG, plugins, Python execution, image generation. Proves the concept of self-hosted chat UI.
CopilotKit: Open-source SDK for building in-app AI copilots. Uses AG-UI protocol. Hooks like useCopilotAction, generative UI playground. AI embedded IN the app, not a separate window.
Magentic-UI (Microsoft): Co-planning (collaboratively create plans), co-tasking (interrupt/guide execution), action guards, "Tell me When" monitoring, plan learning. The most thoughtful human-in-the-loop web UI.
Key Design Decisions in Successful Web UIs
- Streaming is mandatory — Users expect to see tokens appear in real-time, not wait for a complete response
- Dual-pane > single chat — Separating conversation from workspace/output dramatically improves usability for code, documents, and rich content
- Native rendering — Code blocks with syntax highlighting, markdown tables, math rendering, image display, even live HTML previews
- Session management — Sidebar with conversation history, search, folders/projects
- Tool visibility — Show tool calls happening in real-time with collapsible details (not just a spinner)
Current State in Hermes Agent
Hermes has a sophisticated gateway system (gateway/platforms/base.py) with platform adapters. Adding a web UI would follow the same pattern — a new adapter in gateway/platforms/web.py that implements the PlatformAdapter base class.
Relevant existing components:
gateway/run.py — Multi-platform gateway runner, would host the web server alongside other adapters
gateway/platforms/base.py — Base adapter class with send_message, send_image, send_audio, etc.
gateway/session.py — Session management (SQLite + JSON transcripts)
- Tool progress system — Already emits structured progress events
- Image/audio/file handling — Already has MEDIA tag extraction, image URL detection
The web UI would naturally integrate with all of these. The gateway architecture was clearly designed to be extensible.
Implementation Plan
Skill vs. Tool Classification
This should be a core codebase change — a new gateway platform adapter (gateway/platforms/web.py) plus a bundled frontend. It requires real-time WebSocket communication, binary data handling (images, audio, files), and deep integration with the session and tool progress systems. This cannot be expressed as a skill.
What We'd Need
- WebSocket server (FastAPI or aiohttp, both already available in the ecosystem)
- Lightweight frontend (React/Preact or vanilla JS with a bundler)
- New platform adapter implementing PlatformAdapter interface
- Static file serving for the frontend
- Session persistence integration
Phased Rollout
Phase 1: Minimal Chat Web UI
- FastAPI/aiohttp WebSocket server as a new gateway platform adapter
- Simple single-page chat interface with markdown rendering
- Streaming token display via WebSocket events
- Basic session management (new/reset/resume)
- Syntax-highlighted code blocks
- Image display (inline from URLs)
- Tool progress indicators (collapsible)
- Launch via
hermes gateway --web or auto-start on a configurable port
- Mobile-responsive design
Phase 2: Rich Workspace Features
- Dual-pane layout: chat (left) + workspace/artifacts (right)
- Artifact rendering: live HTML/React preview, SVG display, document viewer
- File tree browser (show agent's working directory)
- Diff viewer for file changes (before/after with syntax highlighting)
- Session sidebar with history, search, and resume
- Slash command palette (Cmd+K style)
- Drag-and-drop file upload
- Clipboard paste for images
- Settings panel (model selection, personality, tool toggles)
Phase 3: Advanced Features
- Split-view for parallel subagent monitoring (Mission Control style, inspired by Cursor)
- Canvas mode: spatial arrangement of conversation elements
- Code execution preview (sandboxed iframe for HTML/JS artifacts)
- Collaborative sessions (multiple users viewing same agent session)
- Export conversations (markdown, PDF, JSON)
- Keyboard shortcuts throughout
- AG-UI protocol compatibility for third-party frontend integration
- PWA support for mobile installation
Pros & Cons
Pros
- Fills the biggest UX gap — every competitor has a web interface
- Enables rich features impossible in chat (artifacts, canvas, diffs, file trees)
- No app installation required — works in any browser
- Mobile-friendly (unlike CLI)
- Can serve as the "showcase" interface for demos and onboarding
- Reuses existing gateway architecture
- Opens the door to AG-UI protocol support, making Hermes usable from any compatible frontend
Cons / Risks
- Significant frontend development effort (HTML/CSS/JS is outside current Python-focused codebase)
- Maintenance burden of a second "rich" interface alongside CLI
- Security considerations: WebSocket auth, CORS, local-only vs exposed
- Feature parity pressure: users will expect web UI to support everything CLI does
- Bundle size and dependency management for frontend assets
Open Questions
- Should the frontend be React-based (richer but heavier) or vanilla JS (lighter, fewer deps)?
- Should we adopt AG-UI protocol from the start, or build a custom WebSocket protocol first and add AG-UI compatibility later?
- Should the web UI be opt-in (
hermes gateway --web) or always-on when the gateway runs?
- How do we handle auth for the web UI? Token-based? Local-only binding?
- Should artifacts be stored persistently (like Claude) or ephemeral per session?
References
Overview
Hermes Agent currently supports CLI, Telegram, Discord, WhatsApp, Slack, and Home Assistant as interaction modes. The one glaring gap is a local web-based UI — a browser interface that users can run alongside or instead of the CLI. Every major competitor has one: Claude has Artifacts and the web app, ChatGPT has Canvas and Projects, Open WebUI provides a self-hosted chat interface, Aider has a browser mode, and Open Interpreter has a web UI.
A web UI would serve multiple purposes: (1) a richer visual experience than the terminal, (2) a platform for features that don't map well to chat (artifacts, canvas, file trees, diffs, dashboards), and (3) a gateway for users who don't use messaging platforms but want something more capable than a terminal.
This research was informed by studying 30+ AI interface projects including AG-UI Protocol, CopilotKit, Claude Artifacts, ChatGPT Canvas, tldraw Computer, Open WebUI, Jan.ai, LibreChat, and Magentic-UI.
Research Findings
The Web UI Landscape
The AI agent interface space has matured significantly. Key patterns observed:
AG-UI Protocol (github.com/ag-ui-protocol/ag-ui): An open standard for agent-to-frontend communication with 16 event types for real-time bidirectional state sync. Completes the "agentic protocol stack" alongside MCP (tools) and A2A (agent-to-agent). Already integrated with LangGraph, CrewAI, Google ADK, and others.
Claude Artifacts: Dual-pane architecture with conversation on the left and a workspace on the right. Live rendering of HTML, CSS, React, SVG. Version slider for iterating. Component isolation (multiple distinct assets per chat). This is the gold standard for "rich output beyond text."
ChatGPT Canvas: Inline editing — select text/code, prompt changes for just the selection. Direct manipulation (type directly into Canvas). Coding shortcuts and writing tools. Back button for version history.
Open WebUI: Self-hosted web interface supporting Ollama and OpenAI-compatible APIs. Multi-model, RAG, plugins, Python execution, image generation. Proves the concept of self-hosted chat UI.
CopilotKit: Open-source SDK for building in-app AI copilots. Uses AG-UI protocol. Hooks like useCopilotAction, generative UI playground. AI embedded IN the app, not a separate window.
Magentic-UI (Microsoft): Co-planning (collaboratively create plans), co-tasking (interrupt/guide execution), action guards, "Tell me When" monitoring, plan learning. The most thoughtful human-in-the-loop web UI.
Key Design Decisions in Successful Web UIs
Current State in Hermes Agent
Hermes has a sophisticated gateway system (
gateway/platforms/base.py) with platform adapters. Adding a web UI would follow the same pattern — a new adapter ingateway/platforms/web.pythat implements thePlatformAdapterbase class.Relevant existing components:
gateway/run.py— Multi-platform gateway runner, would host the web server alongside other adaptersgateway/platforms/base.py— Base adapter class withsend_message,send_image,send_audio, etc.gateway/session.py— Session management (SQLite + JSON transcripts)The web UI would naturally integrate with all of these. The gateway architecture was clearly designed to be extensible.
Implementation Plan
Skill vs. Tool Classification
This should be a core codebase change — a new gateway platform adapter (
gateway/platforms/web.py) plus a bundled frontend. It requires real-time WebSocket communication, binary data handling (images, audio, files), and deep integration with the session and tool progress systems. This cannot be expressed as a skill.What We'd Need
Phased Rollout
Phase 1: Minimal Chat Web UI
hermes gateway --webor auto-start on a configurable portPhase 2: Rich Workspace Features
Phase 3: Advanced Features
Pros & Cons
Pros
Cons / Risks
Open Questions
hermes gateway --web) or always-on when the gateway runs?References