Skip to content

fix(tui): prevent slash_worker from spawning duplicate MCP servers#15796

Closed
thapecroth wants to merge 3 commits into
NousResearch:mainfrom
thapecroth:fix/15275-mcp-env-var-guard
Closed

fix(tui): prevent slash_worker from spawning duplicate MCP servers#15796
thapecroth wants to merge 3 commits into
NousResearch:mainfrom
thapecroth:fix/15275-mcp-env-var-guard

Conversation

@thapecroth

Copy link
Copy Markdown

Summary

Fixes #15275

Problem: A single TUI session spawns two sets of hermes mcp serve child processes — one from tui_gateway.entry and one from tui_gateway.slash_worker. Both paths independently import model_tools.py, which eagerly calls discover_mcp_tools() at module level, spawning a full set of MCP server subprocesses each time.

On resource-constrained systems (e.g. Raspberry Pi), this doubles the subprocess count per session and can cause resource exhaustion or cron job failures.

Root cause: Eager discover_mcp_tools() at import time + slash_worker creating its own HermesCLI instance = both paths doing independent tool bootstrap.

Approach: Env var guard (tactical fix)

Gate MCP discovery behind HERMES_SKIP_MCP_DISCOVERY so lightweight consumers that never need MCP-backed tools can opt out:

  • model_tools.py: Check env var before calling discover_mcp_tools()
  • slash_worker.py: Set env var to "1" before importing cli (which transitively imports model_tools)
  • Tests: 5 dedicated tests in test_slash_worker_mcp.py verify guard logic, source ordering, and behavioral skip

Changes

File Change
model_tools.py Wrap discover_mcp_tools() in env var check
tui_gateway/slash_worker.py Set HERMES_SKIP_MCP_DISCOVERY=1 before cli import
tests/tui_gateway/test_slash_worker_mcp.py 5 new tests (guard in source, ordering, behavioral skip)

Alternative: Shared connection pool

See #XXXX for an architectural alternative that replaces eager import-time discovery with a lazy singleton MCPConnectionPool — inspired by Claude Code's SocketPool pattern. That approach prevents duplicates by design (connection reuse) rather than by opt-out, and is more robust against future consumers forgetting to set the env var.

Both approaches fix the same bug; the env var guard is the minimal tactical fix, while the connection pool is the strategic long-term solution.

Test results

98 passed in 22.91s (including 5 new tests in test_slash_worker_mcp.py)

…ousResearch#15275)

Slash worker imports model_tools (via cli), which eagerly calls
discover_mcp_tools() at module level. This spawned a second set of
'hermes mcp serve' children per TUI session — doubling subprocess
count and wasting resources.

Fix: gate MCP discovery behind HERMES_SKIP_MCP_DISCOVERY env var.
  - model_tools.py: check env var before discover_mcp_tools()
  - slash_worker.py: set env var before importing cli
  - tests: 5 dedicated tests verify guard logic and ordering

Tactical fix. See PR #XXXX for the architectural (connection pool)
alternative that prevents duplicates by design.
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/tui Terminal UI (ui-tui/ + tui_gateway/) tool/mcp MCP client and OAuth labels Apr 25, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related: #15797 (lazy connection pool — architectural alternative to this env var guard) and #15440 (earlier fix attempt). All three target #15275.

The HERMES_SKIP_MCP_DISCOVERY env var check at module level used
os.environ but model_tools.py did not import os, causing NameError
in any test that imported model_tools.
- Replace no-op test_discover_mcp_tools_skipped_when_env_set with actual
  assertion that the env var is set
- Fix test_env_var_truthy_values to actually set the env var before checking
- Add test_slash_worker_env_var_value to verify the assigned value is truthy
- Replace misleading integration test with real subprocess-based tests that
  verify discover_mcp_tools is NOT called when env var is set, and IS called
  when env var is not set
- Add test_env_var_cleared_after_test to prevent env leakage
@teknium1

teknium1 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Closing — the bug this fixes no longer exists on main.

The eager module-level discover_mcp_tools() in model_tools.py that this PR gates behind HERMES_SKIP_MCP_DISCOVERY was removed (see the comment now at model_tools.py:182, fixed under #16856). MCP discovery is now explicit per entry point: tui_gateway.entry runs it once in a backgrounded thread, and the slash_worker never triggers it at all.

Verified empirically: constructing the slash_worker's HermesCLI(compact=True) on current main calls discover_mcp_tools() zero times and spawns zero MCP children — so there's no duplicate path left to guard.

Your analysis of the original duplicate-spawn mechanism was correct against the code at the time. Thanks for the careful root-cause work and the test hardening. No salvage needed since the guarded call site is gone.

@teknium1 teknium1 closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tui Terminal UI (ui-tui/ + tui_gateway/) P2 Medium — degraded but workaround exists tool/mcp MCP client and OAuth type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: TUI session eagerly spawns duplicate 'hermes mcp serve' children from both tui_gateway.entry and slash_worker

3 participants