fix(gateway): defer MCP discovery to executor when imported inside event loop (#16856)#16877
Closed
Bartok9 wants to merge 1 commit into
Closed
Conversation
…ent loop (NousResearch#16856) When the gateway lazy-imports run_agent (which transitively imports model_tools), the module-level discover_mcp_tools() call runs _run_on_mcp_loop() with a blocking future.result(timeout=120). If any configured MCP server is unreachable, this blocks the asyncio event loop for up to 120s, killing Discord shard heartbeats and Telegram polling. Root cause: model_tools.py line 143 calls discover_mcp_tools() as a module-level side effect. This is safe from synchronous contexts (CLI, TUI startup) but freezes the event loop when triggered from an async gateway message handler. Fix: detect whether an event loop is running at import time. If so, schedule discovery via loop.run_in_executor() so the event loop stays responsive. In synchronous contexts (no running loop), discovery runs inline as before. discover_mcp_tools() is already idempotent, so the deferred execution is safe. Tests: added TestMcpDiscoveryDeferral with two cases: - Discovery offloaded to executor when loop is running - Discovery runs inline when no event loop exists Fixes NousResearch#16856
Contributor
|
Thanks @Bartok9 — your diagnosis was spot on and we went with suggestion #1 from the original issue (remove the module-level side effect entirely) rather than the import-time event-loop detection. The root fix is in #16899, merged as dd789a4. That variant also eliminates the first-message race where MCP tool schemas might not be registered yet when a user's first message triggers the lazy import. Credit to you in the PR body for catching the bug. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #16856 — Lazy import of
model_toolsblocks the asyncio event loop on the first gateway message when an MCP server is slow/unreachable.Root Cause
model_tools.pycallsdiscover_mcp_tools()as a module-level side effect (line 143). The gateway lazy-importsrun_agent(which transitively importsmodel_tools) inside the asyncio event loop thread.discover_mcp_tools()→_run_on_mcp_loop()→future.result(timeout=120)blocks the loop for up to 120s when an MCP server is unreachable, killing Discord shard heartbeats and Telegram polling.Fix
Detect whether an asyncio event loop is running at import time:
loop.run_in_executor()so the event loop stays responsivediscover_mcp_tools()is already idempotent, so deferred execution is safe. MCP tools may not be available for the very first message in a gateway session, but they'll be ready by the second one — far better than freezing the entire platform for 120s.Changes
model_tools.py: Wrap module-leveldiscover_mcp_tools()with event loop detectiontests/test_model_tools_async_bridge.py: AddedTestMcpDiscoveryDeferralwith 2 test casesTesting
Test coverage:
test_mcp_discovery_offloaded_when_loop_runningtest_mcp_discovery_runs_inline_without_loopHow to verify manually
config.yaml