Skip to content

[PERFORMANCE]: High httpx client churn causes memory pressure under load #1731

@crivetimihai

Description

@crivetimihai

Summary

Under high load (1000+ RPS), memory grows significantly due to httpx.AsyncClient creation/destruction churn. Investigation found two categories of issues:

  1. High Client Churn (P0): MCP SDK creates new httpx.AsyncClient per request via factory pattern
  2. Plugin Client Leak (P1): Two plugins don't close their httpx clients on shutdown

Observed Behavior

During load testing with make load-test-ui:

  • Memory grows ~50-60% within minutes
  • RPS degrades from ~1000 to ~200 over 30 minutes
  • Gateway containers show continuous memory growth

Root Cause Analysis

Issue 1: MCP Client Factory Pattern

The MCP SDK's sse_client and streamablehttp_client call a factory function that creates a new httpx.AsyncClient per request:

# tool_service.py:2268-2304
def get_httpx_client_factory(...) -> httpx.AsyncClient:
    return httpx.AsyncClient(...)  # New client every call

While the SDK properly closes the client via context manager, at 1000 RPS:

  • 1000 new clients created per second
  • Each allocates SSL context, connection pool, internal state
  • GC can't keep up with allocation rate

Affected files:

  • mcpgateway/services/tool_service.py:2268-2304
  • mcpgateway/services/resource_service.py:1256-1277
  • mcpgateway/services/gateway_service.py:2849-2870, 4015-4040, 4156-4181

Issue 2: Plugin httpx Client Leak

Two plugins create httpx.AsyncClient in __init__ but only implement __aexit__ (never called) instead of overriding shutdown():

  • plugins/content_moderation/content_moderation.py:188
  • plugins/webhook_notification/webhook_notification.py:131

Proposed Fix

Fix 1: ReusableAsyncClient for Shared Connections

Create a subclass that doesn't close on context exit, allowing client reuse:

class ReusableAsyncClient(httpx.AsyncClient):
    """AsyncClient that doesn't close on context manager exit."""

    async def __aexit__(self, *args, **kwargs):
        pass  # Don't close - will be closed at service shutdown

    async def force_close(self):
        await super().aclose()

Then modify factory to return shared client for gateways without custom SSL:

def factory(headers=None, timeout=None, auth=None):
    if gateway_ca_cert:
        # Custom SSL - must create new client
        return httpx.AsyncClient(verify=custom_ctx, ...)
    else:
        # No custom SSL - return shared client
        return self._shared_mcp_client

Fix 2: Add shutdown() to Plugins

async def shutdown(self) -> None:
    if hasattr(self, "_client") and self._client:
        await self._client.aclose()
        self._client = None

Files to Modify

Priority File Change
P0 mcpgateway/services/tool_service.py Add ReusableAsyncClient, modify factory
P0 mcpgateway/services/resource_service.py Same pattern
P0 mcpgateway/services/gateway_service.py Same pattern (3 locations)
P1 plugins/content_moderation/content_moderation.py Add shutdown() method
P1 plugins/webhook_notification/webhook_notification.py Add shutdown() method

Expected Impact

  • Memory growth: < 5% over 30 minutes (vs 50%+ currently)
  • RPS: Stable 800-1000 (vs degrading to 200)
  • Connection reuse via HTTP keep-alive

References

  • Full analysis: todo/memory-leak.md
  • Related: todo/perf-issues.md (Issue 1)

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingperformancePerformance related itemspythonPython / backend development (FastAPI)

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions