Skip to content

Feature: RPC Mode for Programmatic Integration & Model Hot-Swapping (inspired by Pi) #360

@teknium1

Description

@teknium1

Overview

Inspired by the Pi coding agent (GitHub), this proposes adding an RPC (Remote Procedure Call) mode to Hermes Agent -- a JSON protocol over stdin/stdout that enables programmatic control of the agent from any language or IDE. Pi offers four integration modes (interactive, print, JSON streaming, and RPC), and the RPC mode is what enables its rich ecosystem of IDE plugins, web UIs, and embedded integrations.

Currently, Hermes Agent runs in two modes: CLI (interactive terminal) and Gateway (Telegram/Discord/WhatsApp/Slack). There's no way for external programs to programmatically start sessions, send messages, switch models, or observe agent state. An RPC mode would unlock IDE integrations (VS Code, Neovim, Emacs), custom web UIs, CI/CD agent automation, and embedding Hermes as a component in larger systems.

This also naturally enables mid-session model hot-swapping -- a frequently requested capability that Pi implements via the same RPC protocol.


Research Findings

How Pi's RPC Mode Works

Pi's RPC mode uses bidirectional JSON-line protocol over stdin/stdout. The host process spawns Pi with --mode rpc and communicates via structured messages:

Client-to-Agent Commands:

{"type": "prompt", "text": "Fix the bug in main.py"}
{"type": "steer", "text": "Actually, try a different approach"}
{"type": "follow_up", "text": "Now add tests"}
{"type": "abort"}
{"type": "set_model", "provider": "anthropic", "model": "claude-sonnet-4-20250514"}
{"type": "compact"}
{"type": "get_state"}
{"type": "get_messages", "from": 5}
{"type": "switch_session", "path": "path/to/session.jsonl"}
{"type": "fork"}

Agent-to-Client Events:

{"type": "state", "state": "idle|running|waiting_for_input"}
{"type": "message_start", "role": "assistant"}
{"type": "message_delta", "content": "partial text..."}
{"type": "message_end"}
{"type": "tool_call", "name": "bash", "args": {"command": "ls"}}
{"type": "tool_result", "name": "bash", "output": "file1.py..."}
{"type": "error", "message": "..."}
{"type": "compact_done", "summary": "..."}

Extension UI Forwarding:
When extensions request UI (dialogs, confirmations), the RPC protocol forwards these to the host:

{"type": "ui_request", "kind": "confirm", "prompt": "Delete this file?", "id": "req_123"}
// Host responds:
{"type": "ui_response", "id": "req_123", "value": true}

Key Design Decisions

  1. JSON Lines protocol -- One JSON object per line, simple to parse in any language. No framing overhead, no binary protocol complexity.
  2. Streaming events -- Message content streams as deltas, matching the LLM streaming pattern. Clients can render progressively.
  3. Bidirectional -- Not just request/response. The agent can initiate UI requests, the client can steer mid-generation.
  4. State machine -- Agent reports state transitions (idle/running/waiting), enabling proper UI state management.
  5. Model control -- set_model command enables mid-session model switching without restarting.

Real-World RPC Integrations

Pi's RPC mode has enabled:

  • pi-coding-agent (Emacs) -- Emacs package that populates Markdown buffers for chat
  • OpenClaw/clawdbot -- Discord bot embedding Pi via SDK
  • Custom web UIs -- Several community projects wrapping Pi in web interfaces
  • Pz (Zig port) -- Uses RPC for language-agnostic integration

Current State in Hermes Agent

Existing Integration Modes

  1. CLI mode (cli.py, HermesCLI) -- Interactive terminal with readline, streaming output, slash commands
  2. Gateway mode (gateway/run.py, GatewayRunner) -- Platform adapters for Telegram, Discord, WhatsApp, Slack

How Messages Flow Currently

User input -> Platform adapter -> GatewayRunner.handle_message()
           -> SessionStore.get_or_create_session()
           -> AIAgent.run_conversation()
           -> Tool calls loop
           -> Response -> Platform adapter -> User

What's Missing

  • No programmatic interface -- External programs can't control the agent
  • No streaming protocol -- Gateway platforms get complete responses; no progressive streaming to external clients
  • No model switching -- Model is set at session creation, fixed for the session
  • No state observability -- External systems can't query agent state (is it running? what tools is it using?)
  • No IDE integration path -- VS Code, Neovim, Emacs can't embed Hermes

Relevant Existing Code

  • cli.py -- HermesCLI class, could be adapted for RPC dispatch
  • run_agent.py -- AIAgent with callbacks (tool_progress, step, clarify)
  • gateway/run.py -- GatewayRunner with message handling, command parsing
  • gateway/session.py -- SessionStore for session management
  • Agent callbacks already exist for tool progress, clarification, and step tracking -- these map naturally to RPC events

Implementation Plan

Skill vs. Tool Classification

This is a core codebase change -- it adds a new runtime mode alongside CLI and Gateway. It touches the agent startup, message flow, and output handling at a fundamental level. Cannot be expressed as a skill or tool.

What We'd Need

  • JSON Lines protocol specification (commands + events)
  • RPC runner that wraps AIAgent with stdin/stdout protocol
  • Streaming event emission during agent execution
  • State machine for agent lifecycle (idle/running/waiting)
  • Model hot-swapping support in AIAgent
  • CLI flag: hermes --mode rpc or hermes rpc

Phased Rollout

Phase 1: Core RPC Protocol

  • Define JSON Lines protocol specification (commands: prompt, abort, get_state, get_messages; events: state, message_delta, tool_call, tool_result, error, done)
  • Implement RpcRunner class that reads commands from stdin, dispatches to AIAgent, emits events to stdout
  • Wire AIAgent callbacks (tool_progress, step) to RPC event emission
  • Add --mode rpc flag to CLI entry point
  • Basic state machine: idle -> running -> idle
  • Integration test: Python subprocess spawning hermes in RPC mode

Phase 2: Advanced Control

  • Add steer command -- queue message to deliver after current tool (relates to Feature: Message Coalescing for Gateway Platforms #345 message coalescing)
  • Add follow_up command -- queue message for after agent completes
  • Add set_model command -- mid-session model switching
    • Requires AIAgent changes: model/config stored mutably, provider client re-initialized
    • Add /model slash command for interactive mode too
  • Add compact command -- trigger context compression
  • Add get_sessions / switch_session commands for session management
  • Clarify callback forwarding -- agent's clarify questions become RPC ui_request events

Phase 3: SDK & Ecosystem

  • Python SDK: HermesClient class wrapping subprocess + RPC protocol
    from hermes import HermesClient
    agent = HermesClient(model="anthropic/claude-sonnet-4-20250514")
    for event in agent.prompt("Fix the bug in main.py"):
        if event.type == "message_delta":
            print(event.content, end="")
  • VS Code extension skeleton (TypeScript, spawns hermes RPC)
  • Neovim plugin skeleton (Lua, spawns hermes RPC)
  • HTTP/WebSocket adapter -- wrap RPC protocol for web UIs
  • Documentation: protocol spec, integration guide, example clients

Pros & Cons

Pros

  • IDE integration -- VS Code, Neovim, Emacs, JetBrains can embed Hermes as a coding assistant
  • Custom UIs -- Anyone can build a web UI, desktop app, or mobile app around Hermes
  • CI/CD automation -- Programmatically run agent tasks in pipelines with structured output
  • Model hot-swapping -- Switch models mid-session based on task complexity (cheap model for simple, strong for hard)
  • Composability -- Other agents/systems can use Hermes as a component via RPC
  • Language agnostic -- JSON Lines works from any language, not just Python
  • Leverages existing code -- AIAgent callbacks already provide the right abstraction layer

Cons / Risks

  • Protocol maintenance -- RPC protocol becomes a public API contract; breaking changes are costly
  • Complexity -- Third runtime mode adds surface area for bugs and testing
  • Streaming complexity -- Proper streaming with abort/steer requires careful state management
  • Security -- RPC mode inherits the spawning process's permissions; need clear documentation on trust boundaries
  • Model switching complexity -- Hot-swapping models mid-session may cause inconsistencies (different tokenizers, different tool calling formats)

Open Questions

  • Should RPC mode support authentication, or assume the spawning process is trusted?
  • Should the protocol support binary data (images, audio) or only text/JSON?
  • Should RPC mode be a separate entry point (hermes rpc) or a flag (hermes --mode rpc)?
  • How should model switching interact with prompt caching (cache invalidation on model change)?
  • Should we support WebSocket transport in addition to stdin/stdout for web UI use cases?
  • Should the Python SDK be a separate package or part of the main hermes-agent install?

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions