fix(mcp): move discovery out of model_tools import side effect (#16856) by teknium1 · Pull Request #16899 · NousResearch/hermes-agent

teknium1 · 2026-04-28T08:16:51Z

Summary

Removes the module-level discover_mcp_tools() call in model_tools.py so lazy-importing this module from inside an asyncio event loop no longer blocks the loop for up to 120s when an MCP server is slow or unreachable. Closes #16856. Supersedes #16877 (same intent, cleaner fix per suggestion #1 in the issue — credit @Bartok9).

Root cause

model_tools.py ran discover_mcp_tools() as an import-time side effect. The gateway lazy-imports run_agent (→ model_tools) on the first user message, which executes inside the asyncio event loop thread. discover_mcp_tools() uses future.result(timeout=120) internally, freezing Discord shard heartbeats and Telegram polling for up to 120s whenever a configured MCP server was unreachable.

Changes

model_tools.py: remove module-level discover_mcp_tools() call (keeps the symbol importable for explicit callers).
gateway/run.py: run discovery via loop.run_in_executor(None, discover_mcp_tools) in start_gateway() before runner.start() — loop stays responsive, tools are ready before the first message arrives.
hermes_cli/main.py: inline discovery in the agent-command startup path (no event loop running).
tui_gateway/entry.py: inline discovery in main() (sync stdin loop).
acp_adapter/entry.py: inline discovery before asyncio.run() (sync context).

Why this instead of PR #16877

#16877 added event-loop detection inside model_tools.py and offloaded discovery to an executor when a loop was running. That worked, but the module-level call made the first gateway message race against discovery (first message could hit the model before MCP tool schemas were registered). Moving the call into each entry point's startup sequence avoids both problems: no import-time side effect, and gateway discovery completes before platforms accept traffic.

Validation

	Before	After
`import model_tools` triggers MCP discovery	yes	no (verified via subprocess probe)
Gateway first-message delay w/ dead MCP server	up to 120s	~0s (runs at startup, not first message)
`tests/test_model_tools_async_bridge.py`	11 passed	11 passed
`tests/tools/test_mcp_tool.py`	177 passed	177 passed

How to verify manually

Configure an unreachable MCP server in config.yaml.
Start the gateway — discovery happens in the executor during startup, logs show "(1 failed)" after the retry window.
Send the first Discord/Telegram message.
Before this PR: first message hangs for ~120s, heartbeat warnings flood logs.
After this PR: first message responds promptly, no heartbeat warnings.

model_tools.py ran discover_mcp_tools() as a module-level side effect. discover_mcp_tools() uses a blocking 120s wait internally (via _run_on_mcp_loop -> future.result(timeout=120)). The gateway lazy-imports run_agent -> model_tools on the first user message, which happens inside the asyncio event loop thread. A slow or unreachable MCP server therefore froze Discord shard heartbeats and Telegram polling for up to 120s on the first message after gateway start. Fix: remove the module-level call. Every entry point now runs discovery explicitly at its own startup, using the context-appropriate blocking/non-blocking pattern: - gateway/run.py: loop.run_in_executor(None, discover_mcp_tools) before platforms start accepting traffic - hermes_cli/main.py: inline (no event loop at CLI startup) - tui_gateway/entry.py: inline (sync stdin loop, no event loop) - acp_adapter/entry.py: inline before asyncio.run() Closes #16856.

…6) (#16899) model_tools.py ran discover_mcp_tools() as a module-level side effect. discover_mcp_tools() uses a blocking 120s wait internally (via _run_on_mcp_loop -> future.result(timeout=120)). The gateway lazy-imports run_agent -> model_tools on the first user message, which happens inside the asyncio event loop thread. A slow or unreachable MCP server therefore froze Discord shard heartbeats and Telegram polling for up to 120s on the first message after gateway start. Fix: remove the module-level call. Every entry point now runs discovery explicitly at its own startup, using the context-appropriate blocking/non-blocking pattern: - gateway/run.py: loop.run_in_executor(None, discover_mcp_tools) before platforms start accepting traffic - hermes_cli/main.py: inline (no event loop at CLI startup) - tui_gateway/entry.py: inline (sync stdin loop, no event loop) - acp_adapter/entry.py: inline before asyncio.run() Closes #16856.

…esearch#16856) (NousResearch#16899) model_tools.py ran discover_mcp_tools() as a module-level side effect. discover_mcp_tools() uses a blocking 120s wait internally (via _run_on_mcp_loop -> future.result(timeout=120)). The gateway lazy-imports run_agent -> model_tools on the first user message, which happens inside the asyncio event loop thread. A slow or unreachable MCP server therefore froze Discord shard heartbeats and Telegram polling for up to 120s on the first message after gateway start. Fix: remove the module-level call. Every entry point now runs discovery explicitly at its own startup, using the context-appropriate blocking/non-blocking pattern: - gateway/run.py: loop.run_in_executor(None, discover_mcp_tools) before platforms start accepting traffic - hermes_cli/main.py: inline (no event loop at CLI startup) - tui_gateway/entry.py: inline (sync stdin loop, no event loop) - acp_adapter/entry.py: inline before asyncio.run() Closes NousResearch#16856.

teknium1 merged commit dd789a4 into main Apr 28, 2026
11 of 12 checks passed

teknium1 deleted the hermes/hermes-9cd7d8df branch April 28, 2026 08:18

teknium1 mentioned this pull request Apr 28, 2026

fix(gateway): defer MCP discovery to executor when imported inside event loop (#16856) #16877

Closed

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/tools Tool registry, model_tools, toolsets comp/gateway Gateway runner, session dispatch, delivery tool/mcp MCP client and OAuth labels Apr 28, 2026

This was referenced May 30, 2026

perf(tui): stop slow/dead MCP servers from freezing TUI startup #35273

Merged

fix(gateway): recover unloaded launchd job after update #35288

Open

buptwz mentioned this pull request Jun 4, 2026

Fix/mcp discovery non gateway paths #38620

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mcp): move discovery out of model_tools import side effect (#16856)#16899

fix(mcp): move discovery out of model_tools import side effect (#16856)#16899
teknium1 merged 1 commit into
mainfrom
hermes/hermes-9cd7d8df

teknium1 commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teknium1 commented Apr 28, 2026

Summary

Root cause

Changes

Why this instead of PR #16877

Validation

How to verify manually

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants