Skip to content

fix(async): close unscheduled coroutines in threadsafe bridges#2657

Open
JithendraNara wants to merge 3 commits intoNousResearch:mainfrom
JithendraNara:codex/fix-mcp-coroutine-leaks
Open

fix(async): close unscheduled coroutines in threadsafe bridges#2657
JithendraNara wants to merge 3 commits intoNousResearch:mainfrom
JithendraNara:codex/fix-mcp-coroutine-leaks

Conversation

@JithendraNara
Copy link
Copy Markdown

@JithendraNara JithendraNara commented Mar 23, 2026

Summary

This fixes the same coroutine-leak pattern anywhere the repo bridges sync code into an async loop with asyncio.run_coroutine_threadsafe(...).

Changes

  • close unscheduled coroutines when threadsafe scheduling fails in MCP, ACP permission/update bridges, DingTalk incoming dispatch, gateway sync-to-async callbacks, the async worker helper, and MCP shutdown
  • keep the close-on-failure behavior scoped to scheduler submission failures so work already handed to the loop is left alone
  • add regression tests for the ACP bridges, DingTalk incoming dispatch, MCP scheduling, and environments.patches.AsyncWorker
  • clean up ACP event tests so mocked scheduler submissions do not leak coroutines themselves

Test plan

  • source venv/bin/activate && pytest tests/acp/test_permissions.py tests/acp/test_events.py tests/gateway/test_dingtalk.py tests/tools/test_mcp_tool.py tests/tools/test_mcp_probe.py tests/test_environment_patches.py -q -W error::RuntimeWarning
  • source venv/bin/activate && pytest tests/acp tests/gateway/test_dingtalk.py tests/tools -q -x

Copilot AI review requested due to automatic review settings March 23, 2026 19:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent leaked/unawaited MCP coroutines when the MCP loop scheduler fails, by deferring coroutine creation until the scheduler is known to be usable and adding regression tests for the failure paths.

Changes:

  • Updated _run_on_mcp_loop to accept a coroutine or a zero-arg coroutine factory, and to cancel scheduled futures on timeout.
  • Updated MCP tool/probe call sites to pass coroutine factories instead of already-created coroutine objects.
  • Added regression tests asserting scheduler failure paths do not emit “coroutine was never awaited” RuntimeWarnings.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
tools/mcp_tool.py Adds factory support + timeout cancellation to _run_on_mcp_loop; updates call sites to avoid creating coroutines before scheduling.
tests/tools/test_mcp_tool.py Adds a regression test ensuring tool handler failure doesn’t leak unawaited coroutine warnings; updates test loop patch helper for new signature.
tests/tools/test_mcp_probe.py Refactors probe test coroutine runner to match new signature; adds regression test for probe failure without unawaited coroutine warnings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/mcp_tool.py
Comment on lines +939 to +942
coro = coro_or_factory() if callable(coro_or_factory) else coro_or_factory
future = asyncio.run_coroutine_threadsafe(coro, loop)
return future.result(timeout=timeout)
try:
return future.result(timeout=timeout)
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If asyncio.run_coroutine_threadsafe(coro, loop) raises (e.g., loop is closed/stopping between the is_running() check and scheduling, or a test patches the scheduler to raise), the freshly created coroutine from the factory will never be awaited/closed and can still trigger "coroutine was never awaited" warnings. Consider wrapping the scheduling call in try/except and calling coro.close() on failure (and optionally validating that the factory returned a coroutine) before re-raising.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 9cc3faa. _run_on_mcp_loop now wraps asyncio.run_coroutine_threadsafe(...) in try/except, closes the created coroutine if scheduling fails, and re-raises. I also added a direct regression test in tests/tools/test_mcp_tool.py that patches the scheduler to raise and asserts we do not leak an unawaited coroutine warning.

@JithendraNara JithendraNara changed the title Fix leaked MCP coroutines on scheduler failures fix(async): close unscheduled coroutines in threadsafe bridges Mar 23, 2026
@JithendraNara
Copy link
Copy Markdown
Author

Expanded this beyond the original MCP-only fix after sweeping the same pattern elsewhere in the repo. The PR now also hardens ACP permission/update bridges, DingTalk incoming dispatch, gateway sync-to-async callbacks, the async worker helper, and MCP shutdown against unscheduled coroutine leaks when threadsafe scheduling fails. Added focused regressions for each area and reran the ACP + DingTalk + tools slices listed in the PR body.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/acp Agent Communication Protocol adapter comp/gateway Gateway runner, session dispatch, delivery tool/mcp MCP client and OAuth labels May 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/acp Agent Communication Protocol adapter comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists tool/mcp MCP client and OAuth type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants