-
Notifications
You must be signed in to change notification settings - Fork 614
[BUG][PERFORMANCE]: TokenScopingMiddleware causes connection pool exhaustion under load #2330
Copy link
Copy link
Open
Open
Copy link
Labels
MUSTP1: Non-negotiable, critical requirements without which the product is non-functional or unsafeP1: Non-negotiable, critical requirements without which the product is non-functional or unsafebugSomething isn't workingSomething isn't workingdatabaseperformancePerformance related itemsPerformance related itemspythonPython / backend development (FastAPI)Python / backend development (FastAPI)readyValidated, ready-to-work-on itemsValidated, ready-to-work-on itemssecurityImproves securityImproves security
Milestone
Description
Summary
The TokenScopingMiddleware acquires database connections for every request with a JWT token, causing connection pool exhaustion under high load. This leads to system crashes with QueuePool limit reached, connection timed out errors.
Root Cause
mcpgateway/middleware/token_scoping.py creates database sessions using next(get_db()) at:
- Line 827: Main middleware path for team-scoped tokens
- Line 433:
_check_team_membership()fallback - Line 545:
_check_resource_team_ownership()fallback
Since endpoint handlers also acquire connections via Depends(get_db), each request uses 2+ connections simultaneously:
Request → TokenScopingMiddleware (conn #1) → Endpoint Handler (conn #2) → Response
↑ held until response ↑ ↑ held until response ↑
Impact
With DB_POOL_SIZE=20 and DB_MAX_OVERFLOW=10:
- Max connections per gateway: 30
- Effective capacity with 2x usage: ~15 concurrent requests
- Under load: Pool exhausts → 60s timeout → crash
Error Message
sqlalchemy.exc.TimeoutError: QueuePool limit of size 20 overflow 10 reached,
connection timed out, timeout 60.00
mcpgateway.middleware.token_scoping - ERROR - Error checking resource team
ownership for /tools/{id}: QueuePool limit of size 20 overflow 10 reached
Proposed Fix
Modify TokenScopingMiddleware to use fresh_db_session() context manager, releasing connections immediately after checks complete (before endpoint handler runs):
# Before (holds connection for entire request):
db = next(get_db())
try:
self._check_team_membership(payload, db=db)
self._check_resource_team_ownership(path, teams, db=db)
finally:
db.close()
# After (releases connection immediately):
with fresh_db_session() as db:
if not self._check_team_membership(payload, db=db):
raise HTTPException(...)
if not self._check_resource_team_ownership(path, teams, db=db):
raise HTTPException(...)
# Connection released BEFORE endpoint handler runsRelated
- PR fix: use fresh_db_session() in list endpoints to prevent connection holding #2326: Fixed REST list endpoints with
fresh_db_session()pattern - Issue [BUG][PERFORMANCE][DB]: Endpoint handlers hold DB sessions during slow MCP backend calls #2323: Original session lifetime investigation
Environment
- docker-compose with 3 gateway instances
- PostgreSQL + pgBouncer
- Locust load test: 4000 users
- Crash occurs after ~10-15 minutes of sustained load
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
MUSTP1: Non-negotiable, critical requirements without which the product is non-functional or unsafeP1: Non-negotiable, critical requirements without which the product is non-functional or unsafebugSomething isn't workingSomething isn't workingdatabaseperformancePerformance related itemsPerformance related itemspythonPython / backend development (FastAPI)Python / backend development (FastAPI)readyValidated, ready-to-work-on itemsValidated, ready-to-work-on itemssecurityImproves securityImproves security