Skip to content

Bug: TUI session eagerly spawns duplicate 'hermes mcp serve' children from both tui_gateway.entry and slash_worker #15275

@Ramadas108

Description

@Ramadas108

Summary

A single TUI session appears to eagerly spawn two separate hermes mcp serve subprocesses during normal operation:

  • one under tui_gateway.entry
  • one under tui_gateway.slash_worker

This is distinct from the already-known orphan-cleanup problem. In this case, the duplicate children are live and parented, not stale zombies.

The result is unnecessary subprocess fan-out, extra MCP sessions, and likely contribution to downstream contention issues such as intermittent SQLite WAL write pressure (fact_store lock symptoms) and avoidable resource growth.

Why this looks like a bug

This does not appear to be an intentional isolation boundary. The observed behavior is that one logical TUI session creates two Hermes MCP server children because both startup paths reach MCP discovery / tool bootstrap.

That is incorrect lifecycle behavior, not merely a performance enhancement request.

Environment

  • Repo: NousResearch/hermes-agent
  • Host: Linux VPS
  • Hermes TUI/gateway in active use
  • Config includes Hermes itself as an MCP server via ~/.hermes/config.yaml

Relevant config shape:

mcp_servers:
  hermes:
    command: hermes
    args: [mcp, serve]

Evidence

Code-path inspection showed:

  • model_tools.py calls discover_mcp_tools() at import time
  • tui_gateway/server.py creates one persistent slash worker per TUI session
  • tui_gateway/slash_worker.py creates one HermesCLI per TUI session
  • both tui_gateway.entry and tui_gateway.slash_worker therefore appear able to trigger MCP discovery / stdio server startup

Live process mapping showed 8 active hermes mcp serve children at one point:

  • 3 under python3 -m tui_gateway.slash_worker
  • 3 under python3 -m tui_gateway.entry
  • 2 under direct hermes / hermes --resume sessions

Representative mapping from ps:

PID=3643062 PPID=3643047  child of python3 -m tui_gateway.slash_worker --session-key ...
PID=3643966 PPID=3643953  child of python3 -m tui_gateway.slash_worker --session-key ...
PID=3656716 PPID=3656702  child of python3 -m tui_gateway.slash_worker --session-key ...
PID=3681559 PPID=3642998  child of python3 -m tui_gateway.entry
PID=3681598 PPID=3643872  child of python3 -m tui_gateway.entry
PID=3681628 PPID=3656657  child of python3 -m tui_gateway.entry
PID=3674837 PPID=3674807  child of direct /usr/bin/python3 /home/openclaw/.local/bin/hermes
PID=3677397 PPID=3677390  child of direct /usr/bin/python3 /home/openclaw/.local/bin/hermes --resume ...

The important part is not the absolute count, but the topology: each inspected TUI session effectively had two Hermes MCP children, one from entry and one from slash worker.

Existing mitigation is insufficient

There is already a restart-scoped cleanup mitigation:

ExecStartPre=/bin/bash -c 'pkill -f "hermes.*mcp" || true'

That helps clean up before a gateway restart, but it does not prevent normal runtime duplication during active sessions.

Expected behavior

For a normal TUI session, Hermes should either:

  1. create one shared MCP stdio child for the session, or
  2. explicitly avoid MCP discovery in one of the two startup paths unless needed

A single logical session should not eagerly double-spawn Hermes MCP subprocesses by default.

Actual behavior

Both tui_gateway.entry and tui_gateway.slash_worker appear to reach MCP bootstrap, causing duplicate hermes mcp serve children during ordinary session startup.

Impact

  • unnecessary subprocess duplication
  • extra MCP sessions and pipe handles
  • avoidable memory / process growth over time
  • likely contributor to transient lock/contention symptoms in other subsystems
  • operational confusion, because restart cleanup can hide the symptom without fixing the source

Suspected root cause

The likely root cause is the combination of:

  • Hermes being configured as an MCP server in config.yaml
  • eager discover_mcp_tools() in model_tools.py
  • slash_worker creating its own HermesCLI
  • both the main TUI path and slash-worker path performing tool bootstrap independently

Proposed direction

Near-term safe fix:

  • prevent slash_worker from eagerly triggering MCP discovery unless it actually needs MCP-backed tools

More durable architectural fix:

  • make MCP server lifecycle shared / singleton per relevant scope, instead of per bootstrap path

Related issues

This seems related to, but distinct from:

This issue is specifically about duplicate creation during normal TUI session startup, not just orphan reaping or performance tuning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/tuiTerminal UI (ui-tui/ + tui_gateway/)tool/mcpMCP client and OAuthtype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions