fix(session-pool): isolate cancel scopes via background task ownership#3739
Merged
crivetimihai merged 3 commits intomainfrom Mar 19, 2026
Merged
fix(session-pool): isolate cancel scopes via background task ownership#3739crivetimihai merged 3 commits intomainfrom
crivetimihai merged 3 commits intomainfrom
Conversation
#3737) The MCP session pool manually called transport_ctx.__aenter__() and session.__aenter__(), attaching anyio cancel scopes to the HTTP request handler task. When a child task in the transport's TaskGroup failed, it cancelled the request handler, causing ~20-45% tool call failures. Changes: - Run transport/session lifecycle in dedicated asyncio.Task per pooled session so cancel scopes bind to the owner task, not request handlers - Add transport-aware is_closed detection (write stream state, owner task health, receive channel count) from PR #3605 - Replace asyncio.wait_for with anyio.fail_after for MCP SDK calls in tool_service.py (4 sites), mcp_session_pool.py health checks (4), and gateway_service.py explicit health RPC (1) - Fix release() to clean up dead-owner sessions instead of leaking them in _active with consumed semaphore slots - Fix _create_session() to clean up owner task on CancelledError via finally block (not just Exception) - Add 18 tests for owner task lifecycle, transport-aware is_closed, release with dead owner, cancellation cleanup, and prompt pooled regression coverage Tested: 607K+ requests across 200-1000 concurrent users with 0% pool-related failures. All servers (Fast Test, Fast Time), all transports (StreamableHTTP, SSE) verified. Closes #3737 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
…ed regression test The prompt fallback test was only asserting that get_mcp_session_pool() raises RuntimeError, not that PromptService._fetch_gateway_prompt_result() actually falls through to the non-pooled sse_client path. Now calls the real method with mocked pool unavailability and verifies the SSE fallback session is initialized and get_prompt is invoked. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
- Add test for is_closed graceful degradation when SDK internals raise - Exercise actual PromptService._fetch_gateway_prompt_result fallback path when pool is unavailable (not just bare get_mcp_session_pool) - Improve force-cancel timeout test with shorter cleanup timeout - Add test for _create_session finally-block BaseException handling Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
This was referenced Mar 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the MCP session pool cancel scope leak that caused ~20-45% tool call failures across all servers and transports when pooling was enabled.
asyncio.Taskper pooled session, isolating anyio cancel scopes from request handler tasksis_closeddetection (incorporates PR fix(session-pool): prevent broken session recycling in MCPSessionPool #3605)asyncio.wait_forwithanyio.fail_afterat 9 MCP SDK call sitesrelease()dead-owner leak and_create_session()CancelledErrorleakCloses #3737
Closes #3520
Closes #3605
Supersedes #3520
Incorporates #3605
Test plan