Fix/mcp discovery non gateway paths#38620
Open
buptwz wants to merge 3 commits into
Open
Conversation
Before this change, `discover_mcp_tools()` was only called at startup by gateway/ACP entry points (gateway/run.py, acp_adapter/entry.py, cron/scheduler.py, tui_gateway/server.py). Every other path that constructs an AIAgent directly — oneshot (`hermes -z`), batch_runner, delegate_tool sub-agents, background_review, curator — silently skipped MCP discovery, so configured MCP servers were never connected and their tools were absent from the agent's tool list even though `hermes mcp test <server>` succeeded. Fix: add `ensure_mcp_discovered()` to tools/mcp_tool.py and call it at the very start of init_agent() (the single convergence point for all AIAgent construction). `ensure_mcp_discovered()` is context-aware: - Sync context (no running event loop): calls discover_mcp_tools(), which is idempotent — already-connected servers are skipped cheaply. - Async context (gateway/ACP event loop running): returns immediately, so the blocking 120 s connect wait that caused NousResearch#16856 cannot recur. Fixes NousResearch#38448 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…vered PR NousResearch#35397 introduced start_background_mcp_discovery() for hermes -z and hermes chat paths so that slow MCP servers don't block the CLI prompt. However it relied on a 0.75 s bounded wait_for_mcp_discovery() call in cli.py's _init_agent(), which is only reached by the interactive chat path — oneshot bypassed it entirely. The previous ensure_mcp_discovered() would call discover_mcp_tools() directly even when a background thread was already running it, risking a concurrent double-connection attempt against the same servers. Fix: detect the background thread via hermes_cli.mcp_startup and join() it without a timeout (appropriate for non-interactive paths where there is no prompt to display quickly). Fall through to a direct synchronous call only when no background thread exists (batch_runner, delegate_tool, background_review, curator, etc.). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Five new tests in test_mcp_startup.py covering the three cases of ensure_mcp_discovered() and its integration with init_agent(): - noop inside async event loop (gateway/ACP safety) - joins background thread without calling discover_mcp_tools directly (avoids concurrent double-call race for hermes -z path) - calls discover_mcp_tools synchronously when no thread exists (batch_runner, delegate_tool, background_review, curator paths) - init_agent() calls ensure_mcp_discovered() exactly once - regression: hermes -z path joins thread rather than double-calling Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #38448 — configured MCP servers pass
hermes mcp testbut their toolsare invisible to the agent in
hermes -z,batch_runner, sub-agents spawnedby
delegate_tool,background_review, andcurator.Problem
AIAgentresolves its available tool list once at construction time insideinit_agent(). If MCP servers haven't connected yet at that moment, their toolsare simply absent — permanently, for that agent instance.
This bug was introduced by two PRs in combination:
PR #16899 fixed #16856 (MCP's blocking 120 s connect wait freezing the
gateway event loop) by removing
discover_mcp_tools()frommodel_tools.py'smodule-level scope and requiring each entry point to call it explicitly at
startup. The PR covered gateway, ACP, TUI, and interactive CLI — but missed
hermes -z,batch_runner,delegate_tool,background_review, andcurator.PR #35397 made CLI MCP startup non-blocking by running
discover_mcp_tools()in a background thread(
start_background_mcp_discovery) and adding a boundedwait_for_mcp_discovery(timeout=0.75)insidecli.py's interactive_init_agent(). However thehermes -zcode path inoneshot.pybypassescli.py's_init_agent()entirely, so the wait is never called:hermes -z flow
main.py: _prepare_agent_startup()
└─ start_background_mcp_discovery() # thread T starts
run_oneshot()
└─ _run_agent()
└─ AIAgent() → init_agent()
└─ get_tool_definitions() ← T may not be done yet
MCP tools missing ❌
For
batch_runner,delegate_tool,background_review, andcurator,_prepare_agent_startup()is never called at all, so no discovery thread iseven started.
gateway/run.pyrun_in_executorat startupacp_adapter/entry.pycron/scheduler.pytui_gateway/server.pyhermes chat(interactive)wait_for_mcp_discovery()incli._init_agent()hermes -z(oneshot)batch_runner.pyagent/background_review.pyagent/curator.pytools/delegate_tool.pysub-agentsFix
tools/mcp_tool.py— addensure_mcp_discovered()A single function that handles all three contexts correctly:
run_in_executorat startup; callingdiscover_mcp_tools()herehermes -z/hermes chatvia PR #35397)join()the thread with no timeout — wait for fulldiscover_mcp_tools()concurrently against the same serversbatch_runner,delegate_tool, etc.)discover_mcp_tools()directly — it is idempotent, so repeatedagent/agent_init.py— callensure_mcp_discovered()at the top ofinit_agent()init_agent()is the single convergence point for everyAIAgentconstruction path. One call here fixes all affected paths — present and future
— without each entry point having to handle it separately.
Testing
Reproduce the bug (requires a configured MCP server)
Tested on: Linux (Ubuntu 24.04, Python 3.13)