Skip to content

fix(mcp): move discovery out of model_tools import side effect (#16856)#16899

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-9cd7d8df
Apr 28, 2026
Merged

fix(mcp): move discovery out of model_tools import side effect (#16856)#16899
teknium1 merged 1 commit into
mainfrom
hermes/hermes-9cd7d8df

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Removes the module-level discover_mcp_tools() call in model_tools.py so lazy-importing this module from inside an asyncio event loop no longer blocks the loop for up to 120s when an MCP server is slow or unreachable. Closes #16856. Supersedes #16877 (same intent, cleaner fix per suggestion #1 in the issue — credit @Bartok9).

Root cause

model_tools.py ran discover_mcp_tools() as an import-time side effect. The gateway lazy-imports run_agent (→ model_tools) on the first user message, which executes inside the asyncio event loop thread. discover_mcp_tools() uses future.result(timeout=120) internally, freezing Discord shard heartbeats and Telegram polling for up to 120s whenever a configured MCP server was unreachable.

Changes

  • model_tools.py: remove module-level discover_mcp_tools() call (keeps the symbol importable for explicit callers).
  • gateway/run.py: run discovery via loop.run_in_executor(None, discover_mcp_tools) in start_gateway() before runner.start() — loop stays responsive, tools are ready before the first message arrives.
  • hermes_cli/main.py: inline discovery in the agent-command startup path (no event loop running).
  • tui_gateway/entry.py: inline discovery in main() (sync stdin loop).
  • acp_adapter/entry.py: inline discovery before asyncio.run() (sync context).

Why this instead of PR #16877

#16877 added event-loop detection inside model_tools.py and offloaded discovery to an executor when a loop was running. That worked, but the module-level call made the first gateway message race against discovery (first message could hit the model before MCP tool schemas were registered). Moving the call into each entry point's startup sequence avoids both problems: no import-time side effect, and gateway discovery completes before platforms accept traffic.

Validation

Before After
import model_tools triggers MCP discovery yes no (verified via subprocess probe)
Gateway first-message delay w/ dead MCP server up to 120s ~0s (runs at startup, not first message)
tests/test_model_tools_async_bridge.py 11 passed 11 passed
tests/tools/test_mcp_tool.py 177 passed 177 passed

How to verify manually

  1. Configure an unreachable MCP server in config.yaml.
  2. Start the gateway — discovery happens in the executor during startup, logs show "(1 failed)" after the retry window.
  3. Send the first Discord/Telegram message.
  4. Before this PR: first message hangs for ~120s, heartbeat warnings flood logs.
  5. After this PR: first message responds promptly, no heartbeat warnings.

model_tools.py ran discover_mcp_tools() as a module-level side effect.
discover_mcp_tools() uses a blocking 120s wait internally (via
_run_on_mcp_loop -> future.result(timeout=120)).

The gateway lazy-imports run_agent -> model_tools on the first user
message, which happens inside the asyncio event loop thread.  A slow or
unreachable MCP server therefore froze Discord shard heartbeats and
Telegram polling for up to 120s on the first message after gateway
start.

Fix: remove the module-level call.  Every entry point now runs
discovery explicitly at its own startup, using the context-appropriate
blocking/non-blocking pattern:

- gateway/run.py:       loop.run_in_executor(None, discover_mcp_tools)
                        before platforms start accepting traffic
- hermes_cli/main.py:   inline (no event loop at CLI startup)
- tui_gateway/entry.py: inline (sync stdin loop, no event loop)
- acp_adapter/entry.py: inline before asyncio.run()

Closes #16856.
@teknium1 teknium1 merged commit dd789a4 into main Apr 28, 2026
11 of 12 checks passed
@teknium1 teknium1 deleted the hermes/hermes-9cd7d8df branch April 28, 2026 08:18
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/tools Tool registry, model_tools, toolsets comp/gateway Gateway runner, session dispatch, delivery tool/mcp MCP client and OAuth labels Apr 28, 2026
cluricaun28 referenced this pull request in cluricaun28/Logos Apr 28, 2026
…6) (#16899)

model_tools.py ran discover_mcp_tools() as a module-level side effect.
discover_mcp_tools() uses a blocking 120s wait internally (via
_run_on_mcp_loop -> future.result(timeout=120)).

The gateway lazy-imports run_agent -> model_tools on the first user
message, which happens inside the asyncio event loop thread.  A slow or
unreachable MCP server therefore froze Discord shard heartbeats and
Telegram polling for up to 120s on the first message after gateway
start.

Fix: remove the module-level call.  Every entry point now runs
discovery explicitly at its own startup, using the context-appropriate
blocking/non-blocking pattern:

- gateway/run.py:       loop.run_in_executor(None, discover_mcp_tools)
                        before platforms start accepting traffic
- hermes_cli/main.py:   inline (no event loop at CLI startup)
- tui_gateway/entry.py: inline (sync stdin loop, no event loop)
- acp_adapter/entry.py: inline before asyncio.run()

Closes #16856.
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…esearch#16856) (NousResearch#16899)

model_tools.py ran discover_mcp_tools() as a module-level side effect.
discover_mcp_tools() uses a blocking 120s wait internally (via
_run_on_mcp_loop -> future.result(timeout=120)).

The gateway lazy-imports run_agent -> model_tools on the first user
message, which happens inside the asyncio event loop thread.  A slow or
unreachable MCP server therefore froze Discord shard heartbeats and
Telegram polling for up to 120s on the first message after gateway
start.

Fix: remove the module-level call.  Every entry point now runs
discovery explicitly at its own startup, using the context-appropriate
blocking/non-blocking pattern:

- gateway/run.py:       loop.run_in_executor(None, discover_mcp_tools)
                        before platforms start accepting traffic
- hermes_cli/main.py:   inline (no event loop at CLI startup)
- tui_gateway/entry.py: inline (sync stdin loop, no event loop)
- acp_adapter/entry.py: inline before asyncio.run()

Closes NousResearch#16856.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…esearch#16856) (NousResearch#16899)

model_tools.py ran discover_mcp_tools() as a module-level side effect.
discover_mcp_tools() uses a blocking 120s wait internally (via
_run_on_mcp_loop -> future.result(timeout=120)).

The gateway lazy-imports run_agent -> model_tools on the first user
message, which happens inside the asyncio event loop thread.  A slow or
unreachable MCP server therefore froze Discord shard heartbeats and
Telegram polling for up to 120s on the first message after gateway
start.

Fix: remove the module-level call.  Every entry point now runs
discovery explicitly at its own startup, using the context-appropriate
blocking/non-blocking pattern:

- gateway/run.py:       loop.run_in_executor(None, discover_mcp_tools)
                        before platforms start accepting traffic
- hermes_cli/main.py:   inline (no event loop at CLI startup)
- tui_gateway/entry.py: inline (sync stdin loop, no event loop)
- acp_adapter/entry.py: inline before asyncio.run()

Closes NousResearch#16856.
dannyJ848 pushed a commit to dannyJ848/hermes-agent that referenced this pull request May 17, 2026
…esearch#16856) (NousResearch#16899)

model_tools.py ran discover_mcp_tools() as a module-level side effect.
discover_mcp_tools() uses a blocking 120s wait internally (via
_run_on_mcp_loop -> future.result(timeout=120)).

The gateway lazy-imports run_agent -> model_tools on the first user
message, which happens inside the asyncio event loop thread.  A slow or
unreachable MCP server therefore froze Discord shard heartbeats and
Telegram polling for up to 120s on the first message after gateway
start.

Fix: remove the module-level call.  Every entry point now runs
discovery explicitly at its own startup, using the context-appropriate
blocking/non-blocking pattern:

- gateway/run.py:       loop.run_in_executor(None, discover_mcp_tools)
                        before platforms start accepting traffic
- hermes_cli/main.py:   inline (no event loop at CLI startup)
- tui_gateway/entry.py: inline (sync stdin loop, no event loop)
- acp_adapter/entry.py: inline before asyncio.run()

Closes NousResearch#16856.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…esearch#16856) (NousResearch#16899)

model_tools.py ran discover_mcp_tools() as a module-level side effect.
discover_mcp_tools() uses a blocking 120s wait internally (via
_run_on_mcp_loop -> future.result(timeout=120)).

The gateway lazy-imports run_agent -> model_tools on the first user
message, which happens inside the asyncio event loop thread.  A slow or
unreachable MCP server therefore froze Discord shard heartbeats and
Telegram polling for up to 120s on the first message after gateway
start.

Fix: remove the module-level call.  Every entry point now runs
discovery explicitly at its own startup, using the context-appropriate
blocking/non-blocking pattern:

- gateway/run.py:       loop.run_in_executor(None, discover_mcp_tools)
                        before platforms start accepting traffic
- hermes_cli/main.py:   inline (no event loop at CLI startup)
- tui_gateway/entry.py: inline (sync stdin loop, no event loop)
- acp_adapter/entry.py: inline before asyncio.run()

Closes NousResearch#16856.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…esearch#16856) (NousResearch#16899)

model_tools.py ran discover_mcp_tools() as a module-level side effect.
discover_mcp_tools() uses a blocking 120s wait internally (via
_run_on_mcp_loop -> future.result(timeout=120)).

The gateway lazy-imports run_agent -> model_tools on the first user
message, which happens inside the asyncio event loop thread.  A slow or
unreachable MCP server therefore froze Discord shard heartbeats and
Telegram polling for up to 120s on the first message after gateway
start.

Fix: remove the module-level call.  Every entry point now runs
discovery explicitly at its own startup, using the context-appropriate
blocking/non-blocking pattern:

- gateway/run.py:       loop.run_in_executor(None, discover_mcp_tools)
                        before platforms start accepting traffic
- hermes_cli/main.py:   inline (no event loop at CLI startup)
- tui_gateway/entry.py: inline (sync stdin loop, no event loop)
- acp_adapter/entry.py: inline before asyncio.run()

Closes NousResearch#16856.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery comp/tools Tool registry, model_tools, toolsets P1 High — major feature broken, no workaround tool/mcp MCP client and OAuth type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lazy import of model_tools blocks asyncio event loop on first gateway message when an MCP server is slow/unreachable

2 participants