-
Notifications
You must be signed in to change notification settings - Fork 613
[PERFORMANCE]: High httpx client churn causes memory pressure under load #1731
Copy link
Copy link
Closed
Copy link
Labels
bugSomething isn't workingSomething isn't workingperformancePerformance related itemsPerformance related itemspythonPython / backend development (FastAPI)Python / backend development (FastAPI)
Milestone
Description
Summary
Under high load (1000+ RPS), memory grows significantly due to httpx.AsyncClient creation/destruction churn. Investigation found two categories of issues:
- High Client Churn (P0): MCP SDK creates new httpx.AsyncClient per request via factory pattern
- Plugin Client Leak (P1): Two plugins don't close their httpx clients on shutdown
Observed Behavior
During load testing with make load-test-ui:
- Memory grows ~50-60% within minutes
- RPS degrades from ~1000 to ~200 over 30 minutes
- Gateway containers show continuous memory growth
Root Cause Analysis
Issue 1: MCP Client Factory Pattern
The MCP SDK's sse_client and streamablehttp_client call a factory function that creates a new httpx.AsyncClient per request:
# tool_service.py:2268-2304
def get_httpx_client_factory(...) -> httpx.AsyncClient:
return httpx.AsyncClient(...) # New client every callWhile the SDK properly closes the client via context manager, at 1000 RPS:
- 1000 new clients created per second
- Each allocates SSL context, connection pool, internal state
- GC can't keep up with allocation rate
Affected files:
mcpgateway/services/tool_service.py:2268-2304mcpgateway/services/resource_service.py:1256-1277mcpgateway/services/gateway_service.py:2849-2870, 4015-4040, 4156-4181
Issue 2: Plugin httpx Client Leak
Two plugins create httpx.AsyncClient in __init__ but only implement __aexit__ (never called) instead of overriding shutdown():
plugins/content_moderation/content_moderation.py:188plugins/webhook_notification/webhook_notification.py:131
Proposed Fix
Fix 1: ReusableAsyncClient for Shared Connections
Create a subclass that doesn't close on context exit, allowing client reuse:
class ReusableAsyncClient(httpx.AsyncClient):
"""AsyncClient that doesn't close on context manager exit."""
async def __aexit__(self, *args, **kwargs):
pass # Don't close - will be closed at service shutdown
async def force_close(self):
await super().aclose()Then modify factory to return shared client for gateways without custom SSL:
def factory(headers=None, timeout=None, auth=None):
if gateway_ca_cert:
# Custom SSL - must create new client
return httpx.AsyncClient(verify=custom_ctx, ...)
else:
# No custom SSL - return shared client
return self._shared_mcp_clientFix 2: Add shutdown() to Plugins
async def shutdown(self) -> None:
if hasattr(self, "_client") and self._client:
await self._client.aclose()
self._client = NoneFiles to Modify
| Priority | File | Change |
|---|---|---|
| P0 | mcpgateway/services/tool_service.py |
Add ReusableAsyncClient, modify factory |
| P0 | mcpgateway/services/resource_service.py |
Same pattern |
| P0 | mcpgateway/services/gateway_service.py |
Same pattern (3 locations) |
| P1 | plugins/content_moderation/content_moderation.py |
Add shutdown() method |
| P1 | plugins/webhook_notification/webhook_notification.py |
Add shutdown() method |
Expected Impact
- Memory growth: < 5% over 30 minutes (vs 50%+ currently)
- RPS: Stable 800-1000 (vs degrading to 200)
- Connection reuse via HTTP keep-alive
References
- Full analysis:
todo/memory-leak.md - Related:
todo/perf-issues.md(Issue 1)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingperformancePerformance related itemsPerformance related itemspythonPython / backend development (FastAPI)Python / backend development (FastAPI)