Skip to content

Fix/mcp discovery non gateway paths#38620

Open
buptwz wants to merge 3 commits into
NousResearch:mainfrom
buptwz:fix/mcp-discovery-non-gateway-paths
Open

Fix/mcp discovery non gateway paths#38620
buptwz wants to merge 3 commits into
NousResearch:mainfrom
buptwz:fix/mcp-discovery-non-gateway-paths

Conversation

@buptwz

@buptwz buptwz commented Jun 4, 2026

Copy link
Copy Markdown

Summary

Fixes #38448 — configured MCP servers pass hermes mcp test but their tools
are invisible to the agent in hermes -z, batch_runner, sub-agents spawned
by delegate_tool, background_review, and curator.


Problem

AIAgent resolves its available tool list once at construction time inside
init_agent(). If MCP servers haven't connected yet at that moment, their tools
are simply absent — permanently, for that agent instance.

This bug was introduced by two PRs in combination:

PR #16899 fixed #16856 (MCP's blocking 120 s connect wait freezing the
gateway event loop) by removing discover_mcp_tools() from model_tools.py's
module-level scope and requiring each entry point to call it explicitly at
startup. The PR covered gateway, ACP, TUI, and interactive CLI — but missed
hermes -z, batch_runner, delegate_tool, background_review, and
curator.

PR #35397 made CLI MCP startup non-blocking by running
discover_mcp_tools() in a background thread
(start_background_mcp_discovery) and adding a bounded
wait_for_mcp_discovery(timeout=0.75) inside cli.py's interactive
_init_agent(). However the hermes -z code path in oneshot.py bypasses
cli.py's _init_agent() entirely, so the wait is never called:

hermes -z flow

main.py: _prepare_agent_startup()
└─ start_background_mcp_discovery() # thread T starts
run_oneshot()
└─ _run_agent()
└─ AIAgent() → init_agent()
└─ get_tool_definitions() ← T may not be done yet
MCP tools missing ❌

For batch_runner, delegate_tool, background_review, and curator,
_prepare_agent_startup() is never called at all, so no discovery thread is
even started.

Entry point Status
gateway/run.py run_in_executor at startup
acp_adapter/entry.py ✅ synchronous at startup
cron/scheduler.py ✅ synchronous at startup
tui_gateway/server.py ✅ synchronous at startup
hermes chat (interactive) wait_for_mcp_discovery() in cli._init_agent()
hermes -z (oneshot) ❌ thread started but never awaited
batch_runner.py ❌ discovery never started
agent/background_review.py ❌ discovery never started
agent/curator.py ❌ discovery never started
tools/delegate_tool.py sub-agents ❌ discovery never started

Fix

tools/mcp_tool.py — add ensure_mcp_discovered()

A single function that handles all three contexts correctly:

Context Behaviour
Async event loop running (gateway/ACP) no-op — those paths use run_in_executor at startup; calling discover_mcp_tools() here
would re-introduce the #16856 freeze
Background thread in flight (hermes -z / hermes chat via PR #35397) join() the thread with no timeout — wait for full
discovery; avoids two threads calling discover_mcp_tools() concurrently against the same servers
No background thread (batch_runner, delegate_tool, etc.) call discover_mcp_tools() directly — it is idempotent, so repeated
calls across agents in the same process are cheap

agent/agent_init.py — call ensure_mcp_discovered() at the top of init_agent()

init_agent() is the single convergence point for every AIAgent
construction path. One call here fixes all affected paths — present and future
— without each entry point having to handle it separately.


Testing

Reproduce the bug (requires a configured MCP server)

# Confirm the server works in isolation                                                                                               
hermes mcp test camoufox        # should show 17 tools
                                                                                                                                      
# Before this fix: tool not available                                                                                              
hermes -z "Call mcp_camoufox_camoufox_status"                                                                                         
                                                                                                                                      
# After this fix: tool available                                                                                                      
hermes -z "Call mcp_camoufox_camoufox_status"                                                                                         
                                                                                                                                      
Automated tests
                                                                                                                                      
Five new unit tests added to tests/hermes_cli/test_mcp_startup.py, all                                                             
passing with no external dependencies:  
                                            
test_ensure_mcp_discovered_noop_inside_async_event_loop                                                                               
  └─ asyncio.run() wraps the call; discover_mcp_tools must NOT be invoked
                                                                                                                                      
test_ensure_mcp_discovered_joins_background_thread_without_direct_call                                                             
  └─ live thread sleeps 50 ms; ensure_mcp_discovered must block until done                                                            
     and must NOT call discover_mcp_tools directly (no concurrent double-call)                                                        
                                            
test_ensure_mcp_discovered_calls_discover_when_no_thread                                                                              
  └─ no thread set; discover_mcp_tools must be called exactly once                                                                    
                                                                                                                                      
test_init_agent_calls_ensure_mcp_discovered                                                                                           
  └─ patches ensure_mcp_discovered; asserts init_agent invokes it exactly once                                                        
                                                                                                                                      
test_ensure_mcp_discovered_oneshot_path_joins_thread_not_double_calls                                                              
  └─ full hermes -z scenario: thread in flight, ensure_mcp_discovered joins it,                                                       
     discover_mcp_tools is never called directly                                                                                      
                                                                                                                                      
Run:                                                                                                                                  
python -m pytest tests/hermes_cli/test_mcp_startup.py -v                                                                              
# 9 passed in ~1.3 s                                                                                                               
                                                                                                                                      
Regression check — existing tests                                                                                                  
                                                                                                                                      
python -m pytest tests/hermes_cli/ tests/tools/test_mcp_tool.py -v                                                                 
                                                                                                                                      
What reviewers should verify                
                                                                                                                                      
- Gateway path: asyncio.get_running_loop() guard is hit → ensure_mcp_discovered returns immediately → no behaviour change for         
gateway/ACP                                                                                                                           
- Interactive CLI: background thread is joined (not double-run) → same tools as before                                                
- No MCP config: _load_mcp_config() returns {} → discover_mcp_tools exits in <1 ms → no startup cost for users without MCP            
                                                                                                                                   
Fixes #38448                                                                                                                          
Related: #16856, #16899, #35397                                                                                                       

Tested on: Linux (Ubuntu 24.04, Python 3.13)

buptwz and others added 3 commits June 4, 2026 09:27
Before this change, `discover_mcp_tools()` was only called at startup by
gateway/ACP entry points (gateway/run.py, acp_adapter/entry.py,
cron/scheduler.py, tui_gateway/server.py).  Every other path that
constructs an AIAgent directly — oneshot (`hermes -z`), batch_runner,
delegate_tool sub-agents, background_review, curator — silently skipped
MCP discovery, so configured MCP servers were never connected and their
tools were absent from the agent's tool list even though
`hermes mcp test <server>` succeeded.

Fix: add `ensure_mcp_discovered()` to tools/mcp_tool.py and call it at
the very start of init_agent() (the single convergence point for all
AIAgent construction).

`ensure_mcp_discovered()` is context-aware:
- Sync context (no running event loop): calls discover_mcp_tools(), which
  is idempotent — already-connected servers are skipped cheaply.
- Async context (gateway/ACP event loop running): returns immediately, so
  the blocking 120 s connect wait that caused NousResearch#16856 cannot recur.

Fixes NousResearch#38448

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…vered

PR NousResearch#35397 introduced start_background_mcp_discovery() for hermes -z and
hermes chat paths so that slow MCP servers don't block the CLI prompt.
However it relied on a 0.75 s bounded wait_for_mcp_discovery() call in
cli.py's _init_agent(), which is only reached by the interactive chat
path — oneshot bypassed it entirely.

The previous ensure_mcp_discovered() would call discover_mcp_tools()
directly even when a background thread was already running it, risking
a concurrent double-connection attempt against the same servers.

Fix: detect the background thread via hermes_cli.mcp_startup and join()
it without a timeout (appropriate for non-interactive paths where there
is no prompt to display quickly).  Fall through to a direct synchronous
call only when no background thread exists (batch_runner, delegate_tool,
background_review, curator, etc.).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Five new tests in test_mcp_startup.py covering the three cases of
ensure_mcp_discovered() and its integration with init_agent():

- noop inside async event loop (gateway/ACP safety)
- joins background thread without calling discover_mcp_tools directly
  (avoids concurrent double-call race for hermes -z path)
- calls discover_mcp_tools synchronously when no thread exists
  (batch_runner, delegate_tool, background_review, curator paths)
- init_agent() calls ensure_mcp_discovered() exactly once
- regression: hermes -z path joins thread rather than double-calling

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder tool/mcp MCP client and OAuth P2 Medium — degraded but workaround exists labels Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists tool/mcp MCP client and OAuth type/bug Something isn't working

Projects

None yet

2 participants