Skip to content

fix(mcp): run OSV malware check off event loop#29190

Open
qdaszx wants to merge 2 commits into
NousResearch:mainfrom
qdaszx:fix/mcp-osv-async-check
Open

fix(mcp): run OSV malware check off event loop#29190
qdaszx wants to merge 2 commits into
NousResearch:mainfrom
qdaszx:fix/mcp-osv-async-check

Conversation

@qdaszx

@qdaszx qdaszx commented May 20, 2026

Copy link
Copy Markdown

Summary

_run_stdio() is an async MCP startup path, but it was calling the OSV malware preflight synchronously. That preflight eventually reaches urllib.request.urlopen(), so an intermittent OSV SSL/connect hang can block the entire MCP event loop during server discovery. In TUI startup this can exceed the 15s gateway readiness window even though MCP discovery itself is waiting on a 120s future.

This PR now covers both layers:

  1. move the blocking OSV lookup into asyncio.to_thread() so MCP discovery keeps the event loop responsive
  2. wrap that threaded check with an explicit startup guard so stdio MCP startup fails open if the OSV check itself never returns promptly

This preserves the existing pre-spawn malware check and error handling while avoiding a global socket.setdefaulttimeout() mutation.

The bug

tools/mcp_tool.py::_run_stdio() runs inside the MCP asyncio loop:

malware_error = check_package_for_malware(command, args)

check_package_for_malware() calls into tools/osv_check.py, which uses synchronous urllib I/O. When the OSV request stalls in SSL handshake or network setup, the MCP event loop cannot schedule other work. That makes MCP startup appear hung and can cause the TUI gateway startup timeout described in #29184.

Even after moving the work into a thread, a lower-level OSV stall can still leave startup waiting on the thread result. Because the OSV malware check is already fail-open for network errors, the bounded async wrapper should also fail open on timeout.

The fix

  • Keep the malware check in the same pre-spawn position, before constructing/starting the stdio MCP server.
  • Run it via asyncio.to_thread(check_package_for_malware, command, args) so blocking urllib work happens off the event-loop thread.
  • Wrap the threaded check in asyncio.wait_for(..., timeout=_OSV_MALWARE_CHECK_TIMEOUT).
  • On timeout, log at debug level and continue with malware_error = None, matching the existing fail-open policy for OSV/network failures.
  • Preserve the existing behavior when OSV flags a package: _run_stdio() still raises ValueError("MCP server '<name>': ...") before spawning.

Regression coverage

Added coverage in tests/tools/test_mcp_tool_issue_948.py:

  • test_run_stdio_malware_check_does_not_block_event_loop
    • patches the malware check to sleep for 200ms, simulating a slow/blocking OSV lookup
    • starts a stdio MCP server task with mocked MCP session/client objects
    • asserts that await asyncio.sleep(0.05) resumes promptly instead of being delayed by the malware check
    • verified RED first: before the fix the test failed with elapsed ~= 0.20s; after the fix it passes
  • test_run_stdio_malware_check_timeout_fails_open
    • lowers _OSV_MALWARE_CHECK_TIMEOUT to 50ms
    • simulates a malware check that would return only after 200ms
    • asserts stdio MCP startup proceeds before the hung check returns

Validation

Suite Result
python -m pytest tests/tools/test_mcp_tool_issue_948.py::test_run_stdio_malware_check_does_not_block_event_loop tests/tools/test_mcp_tool_issue_948.py::test_run_stdio_malware_check_timeout_fails_open -q -o 'addopts=' 2 passed
python -m pytest tests/tools/test_mcp_tool_issue_948.py tests/tools/test_osv_check.py tests/tools/test_mcp_tool.py -q -o 'addopts=' 225 passed
git diff --check passed

Closes #29184.

Move the blocking OSV malware lookup for stdio MCP servers into asyncio.to_thread so MCP discovery startup is not frozen by intermittent OSV SSL handshakes.

Add a regression test that verifies the event loop remains responsive while the malware check is slow.

Closes NousResearch#29184
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists tool/mcp MCP client and OAuth labels May 20, 2026
Fail open if the threaded OSV malware check itself does not return within the startup guard, covering lower-level SSL/connect stalls without mutating global socket defaults.

Add a regression test for the bounded timeout path.
@qdaszx

qdaszx commented May 20, 2026

Copy link
Copy Markdown
Author

Follow-up after reading #29192: thanks @ygd58 for calling out the second layer here.

The original version of this PR only moved the blocking OSV lookup off the MCP event loop with asyncio.to_thread(). #29192 pointed out that urllib/SSL handshake stalls can still leave the OSV check itself waiting longer than startup should tolerate.

I incorporated that concern here in commit 2e7ddb569 by adding an explicit async startup guard around the threaded malware check:

  • keep asyncio.to_thread(check_package_for_malware, ...) so the MCP event loop stays responsive
  • wrap it with asyncio.wait_for(..., timeout=_OSV_MALWARE_CHECK_TIMEOUT)
  • fail open on timeout, matching the existing OSV/network-error policy
  • avoid mutating global socket defaults via socket.setdefaulttimeout()
  • add a regression test for the bounded-timeout path

Validation after the follow-up:

Suite Result
targeted event-loop + timeout tests 2 passed
test_mcp_tool_issue_948.py + test_osv_check.py + test_mcp_tool.py 225 passed

Appreciate the reference PR — it made this fix more complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P2 Medium — degraded but workaround exists tool/mcp MCP client and OAuth type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP startup hangs: synchronous HTTP call in osv_check blocks asyncio event loop

2 participants