Skip to content

fix(tui): lazy MCP connection pool — defer server spawning to first tool use#15797

Closed
thapecroth wants to merge 2 commits into
NousResearch:mainfrom
thapecroth:fix/15275-mcp-lazy-connection-pool
Closed

fix(tui): lazy MCP connection pool — defer server spawning to first tool use#15797
thapecroth wants to merge 2 commits into
NousResearch:mainfrom
thapecroth:fix/15275-mcp-lazy-connection-pool

Conversation

@thapecroth

Copy link
Copy Markdown

Summary

Fixes #15275

Problem: A single TUI session spawns two sets of hermes mcp serve child processes — one from tui_gateway.entry and one from tui_gateway.slash_worker. Both paths independently import model_tools.py, which eagerly calls discover_mcp_tools() at module level.

Approach: Lazy connection pool (architectural fix)

Inspired by Claude Code's SocketPool pattern. Instead of eagerly spawning MCP servers at import time, defer connections to first actual tool use:

  • discover_mcp_tools(lazy=True): Registers tool stubs from a local cache (~/.hermes/cache/mcp_tools/) so the model sees MCP tools at session start — but no subprocesses spawn
  • ensure_mcp_initialized(): Thread-safe singleton. On first tool call, connects all configured MCP servers. Subsequent calls are no-ops. Equivalent to Claude Code's ensureConnected()refreshClients() flow
  • Tool cache: After successful discovery, tool schemas are persisted to disk. Future lazy loads read from cache instead of connecting

Why this is better than the env var guard

Env var guard (#15796) Lazy connection pool (this PR)
Prevents duplicates Consumer must remember to set env var By design — no opt-in needed
New consumer safety Must add os.environ[...] before import Just import — it's safe by default
Startup cost Skips MCP entirely for slash_worker Zero-cost: only registers stubs from cache
MCP tools available No (slash_worker skips entirely) Yes (stubs available, connect on first use)
Complexity 6 lines ~200 lines (pool + cache + tests)

Changes

File Change
tools/mcp_tool.py Add _mcp_initialized flag, ensure_mcp_initialized(), _register_mcp_tool_stubs(), _load/save_cached_tool_definitions(), _make_lazy_handler()
model_tools.py Call discover_mcp_tools(lazy=True) instead of discover_mcp_tools()
tests/tui_gateway/test_slash_worker_mcp.py 11 new tests (lazy discovery, ensure idempotency, cache, import safety)

Flow diagram

import model_tools
  └─ discover_mcp_tools(lazy=True)
       └─ _register_mcp_tool_stubs()
            └─ reads ~/.hermes/cache/mcp_tools/*.json
            └─ registers tool stubs with lazy handlers
            └─ NO subprocesses spawned ✓

model calls mcp__server__tool({"query": "..."})
  └─ _lazy_handler()
       └─ ensure_mcp_initialized()  ← first call only
            └─ register_mcp_servers()
                 └─ spawns MCP subprocesses
                 └─ caches tool schemas for next time
       └─ delegates to real server session

Alternative: Env var guard

See #15796 for a simpler tactical fix using HERMES_SKIP_MCP_DISCOVERY. Both approaches fix the same bug. The env var guard is minimal and low-risk; this PR is the strategic long-term solution.

Test results

102 passed in 16.45s (including 11 new tests in test_slash_worker_mcp.py)

…ool use (NousResearch#15275)

Architectural fix inspired by Claude Code's SocketPool pattern.
Replaces eager import-time MCP server spawning with lazy initialization:

  - discover_mcp_tools(lazy=True): registers tool stubs from cache
    without connecting to MCP servers
  - ensure_mcp_initialized(): thread-safe singleton that connects all
    configured MCP servers on first actual tool call (no-op thereafter)
  - _register_mcp_tool_stubs(): loads cached tool schemas from
    ~/.hermes/cache/mcp_tools/ so the model sees MCP tools at session
    start without subprocess overhead
  - _save_cached_tool_definitions(): persists tool schemas after
    successful discovery for future lazy loads

Key difference from env var guard (PR NousResearch#15796): this prevents duplicates
by design — no consumer needs to remember to set an env var. Multiple
importers of model_tools in the same process share the same lazy pool.
MCP servers only spawn when a tool is actually invoked.

11 new tests cover: lazy discovery, ensure_mcp_initialized idempotency,
cache persistence, and slash_worker import safety.
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/tui Terminal UI (ui-tui/ + tui_gateway/) tool/mcp MCP client and OAuth labels Apr 25, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related: #15796 (env var guard — tactical alternative) and #15440 (earlier fix attempt). All three target #15275.

Add real subprocess test that verifies importing model_tools does NOT
call register_mcp_servers at import time (the core NousResearch#15275 fix).
This catches regressions even if the mock-based tests pass.
@teknium1

teknium1 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Closing — the duplicate-spawn bug this targets no longer exists on main.

The eager module-level discover_mcp_tools() in model_tools.py was removed (comment now at model_tools.py:182, fixed under #16856). Discovery is explicit per entry point now: tui_gateway.entry runs it once in a backgrounded daemon thread, and the slash_worker never triggers it. Verified empirically — slash_worker HermesCLI(compact=True) construction makes zero discover_mcp_tools() calls and spawns zero MCP children.

Separately: the lazy connect-on-first-tool-use idea here is genuinely interesting, but it's an optimization rather than a bugfix, and it conflicts with the design #16856 deliberately landed — discover in the background at entry, then briefly join (wait_for_mcp_discovery) before the first agent build so fast servers land in the tool snapshot while dead servers aren't waited on. Re-introducing a disk tool cache + ~200 LOC connection pool for a problem that's already solved isn't a trade we want to make right now.

Thanks for the thorough write-up and the SocketPool-inspired design — appreciated even though we're not taking it.

@teknium1 teknium1 closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tui Terminal UI (ui-tui/ + tui_gateway/) P2 Medium — degraded but workaround exists tool/mcp MCP client and OAuth type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: TUI session eagerly spawns duplicate 'hermes mcp serve' children from both tui_gateway.entry and slash_worker

3 participants