Skip to content

fix(gateway): defer MCP discovery to executor when imported inside event loop (#16856)#16877

Closed
Bartok9 wants to merge 1 commit into
NousResearch:mainfrom
Bartok9:fix/16856-mcp-lazy-import-blocks-event-loop
Closed

fix(gateway): defer MCP discovery to executor when imported inside event loop (#16856)#16877
Bartok9 wants to merge 1 commit into
NousResearch:mainfrom
Bartok9:fix/16856-mcp-lazy-import-blocks-event-loop

Conversation

@Bartok9

@Bartok9 Bartok9 commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #16856 — Lazy import of model_tools blocks the asyncio event loop on the first gateway message when an MCP server is slow/unreachable.

Root Cause

model_tools.py calls discover_mcp_tools() as a module-level side effect (line 143). The gateway lazy-imports run_agent (which transitively imports model_tools) inside the asyncio event loop thread. discover_mcp_tools()_run_on_mcp_loop()future.result(timeout=120) blocks the loop for up to 120s when an MCP server is unreachable, killing Discord shard heartbeats and Telegram polling.

Fix

Detect whether an asyncio event loop is running at import time:

  • Running loop (gateway): Schedule discovery via loop.run_in_executor() so the event loop stays responsive
  • No running loop (CLI/TUI startup): Run discovery inline as before

discover_mcp_tools() is already idempotent, so deferred execution is safe. MCP tools may not be available for the very first message in a gateway session, but they'll be ready by the second one — far better than freezing the entire platform for 120s.

Changes

  • model_tools.py: Wrap module-level discover_mcp_tools() with event loop detection
  • tests/test_model_tools_async_bridge.py: Added TestMcpDiscoveryDeferral with 2 test cases

Testing

python3 -m pytest tests/test_model_tools_async_bridge.py -x -q -o "addopts="
# 13 passed (11 existing + 2 new)

Test coverage:

Test Verifies
test_mcp_discovery_offloaded_when_loop_running Discovery scheduled via executor, not inline
test_mcp_discovery_runs_inline_without_loop CLI/TUI path unchanged — discovery runs synchronously

How to verify manually

  1. Configure an unreachable MCP server in config.yaml
  2. Start the gateway
  3. Send a message via Discord/Telegram
  4. Before: First message hangs for ~120s, heartbeat warnings flood logs
  5. After: First message responds promptly, MCP tools become available shortly after

…ent loop (NousResearch#16856)

When the gateway lazy-imports run_agent (which transitively imports
model_tools), the module-level discover_mcp_tools() call runs
_run_on_mcp_loop() with a blocking future.result(timeout=120). If any
configured MCP server is unreachable, this blocks the asyncio event loop
for up to 120s, killing Discord shard heartbeats and Telegram polling.

Root cause: model_tools.py line 143 calls discover_mcp_tools() as a
module-level side effect. This is safe from synchronous contexts (CLI,
TUI startup) but freezes the event loop when triggered from an async
gateway message handler.

Fix: detect whether an event loop is running at import time. If so,
schedule discovery via loop.run_in_executor() so the event loop stays
responsive. In synchronous contexts (no running loop), discovery runs
inline as before. discover_mcp_tools() is already idempotent, so the
deferred execution is safe.

Tests: added TestMcpDiscoveryDeferral with two cases:
- Discovery offloaded to executor when loop is running
- Discovery runs inline when no event loop exists

Fixes NousResearch#16856
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/gateway Gateway runner, session dispatch, delivery comp/tools Tool registry, model_tools, toolsets tool/mcp MCP client and OAuth labels Apr 28, 2026
@teknium1

Copy link
Copy Markdown
Contributor

Thanks @Bartok9 — your diagnosis was spot on and we went with suggestion #1 from the original issue (remove the module-level side effect entirely) rather than the import-time event-loop detection. The root fix is in #16899, merged as dd789a4. That variant also eliminates the first-message race where MCP tool schemas might not be registered yet when a user's first message triggers the lazy import. Credit to you in the PR body for catching the bug.

@teknium1 teknium1 closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery comp/tools Tool registry, model_tools, toolsets P1 High — major feature broken, no workaround tool/mcp MCP client and OAuth type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lazy import of model_tools blocks asyncio event loop on first gateway message when an MCP server is slow/unreachable

3 participants