Skip to content

[BUG][PERFORMANCE]: TokenScopingMiddleware causes connection pool exhaustion under load #2330

@crivetimihai

Description

@crivetimihai

Summary

The TokenScopingMiddleware acquires database connections for every request with a JWT token, causing connection pool exhaustion under high load. This leads to system crashes with QueuePool limit reached, connection timed out errors.

Root Cause

mcpgateway/middleware/token_scoping.py creates database sessions using next(get_db()) at:

  • Line 827: Main middleware path for team-scoped tokens
  • Line 433: _check_team_membership() fallback
  • Line 545: _check_resource_team_ownership() fallback

Since endpoint handlers also acquire connections via Depends(get_db), each request uses 2+ connections simultaneously:

Request → TokenScopingMiddleware (conn #1) → Endpoint Handler (conn #2) → Response
          ↑ held until response ↑            ↑ held until response ↑

Impact

With DB_POOL_SIZE=20 and DB_MAX_OVERFLOW=10:

  • Max connections per gateway: 30
  • Effective capacity with 2x usage: ~15 concurrent requests
  • Under load: Pool exhausts → 60s timeout → crash

Error Message

sqlalchemy.exc.TimeoutError: QueuePool limit of size 20 overflow 10 reached,
connection timed out, timeout 60.00

mcpgateway.middleware.token_scoping - ERROR - Error checking resource team
ownership for /tools/{id}: QueuePool limit of size 20 overflow 10 reached

Proposed Fix

Modify TokenScopingMiddleware to use fresh_db_session() context manager, releasing connections immediately after checks complete (before endpoint handler runs):

# Before (holds connection for entire request):
db = next(get_db())
try:
    self._check_team_membership(payload, db=db)
    self._check_resource_team_ownership(path, teams, db=db)
finally:
    db.close()

# After (releases connection immediately):
with fresh_db_session() as db:
    if not self._check_team_membership(payload, db=db):
        raise HTTPException(...)
    if not self._check_resource_team_ownership(path, teams, db=db):
        raise HTTPException(...)
# Connection released BEFORE endpoint handler runs

Related

Environment

  • docker-compose with 3 gateway instances
  • PostgreSQL + pgBouncer
  • Locust load test: 4000 users
  • Crash occurs after ~10-15 minutes of sustained load

Metadata

Metadata

Assignees

Labels

MUSTP1: Non-negotiable, critical requirements without which the product is non-functional or unsafebugSomething isn't workingdatabaseperformancePerformance related itemspythonPython / backend development (FastAPI)readyValidated, ready-to-work-on itemssecurityImproves security

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions