Skip to content

[BUG]: RBAC middleware holds database sessions for entire request duration #2340

@crivetimihai

Description

@crivetimihai

Summary

The RBAC middleware (mcpgateway/middleware/rbac.py) uses Depends(get_db) which holds database sessions open for the entire request duration. Under high load, this causes session accumulation and system degradation even when endpoint handlers use fresh_db_session().

Impact

  • Severity: Critical
  • Affected: All authenticated endpoints (346 usages across the codebase)
  • Symptom: System degrades under load as idle-in-transaction connections accumulate

Evidence from Load Testing

With 4000 concurrent users, 10 workers:

Time RPS idle_in_tx (max age)
22:51:43 2180 54 (3s)
22:52:17 462 54 (37s)
22:52:51 251 51 (72s)
22:53:01 302 51 (81s)

Stuck queries (100+ seconds old):

SELECT email_teams.*     -- RBAC permission check
SELECT servers_1.*, a2a_agents.*
SELECT prompts.*
SELECT resources.*

Root Cause

In mcpgateway/middleware/rbac.py:

1. get_current_user_with_permissions (line 90)

async def get_current_user_with_permissions(
    request: Request,
    credentials: Optional[HTTPAuthorizationCredentials] = Depends(security),
    jwt_token: Optional[str] = Cookie(default=None),
    db: Session = Depends(get_db)  # <-- HOLDS SESSION UNTIL RESPONSE SENT
):
    ...
    return {
        "db": db,  # <-- PASSED TO DOWNSTREAM
        ...
    }

2. get_permission_service (line 72)

async def get_permission_service(db: Session = Depends(get_db)) -> PermissionService:
    return PermissionService(db)

3. Permission decorators use user_context["db"]

  • Line 266: permission_service = PermissionService(user_context["db"])
  • Line 411: permission_service = PermissionService(user_context["db"])
  • Line 493: permission_service = PermissionService(user_context["db"])
  • Line 548: self.permission_service = PermissionService(user_context["db"])

Request Flow

1. Request arrives
2. RBAC middleware: get_current_user_with_permissions() opens DB session
3. Permission decorators use this session
4. Endpoint handler runs (may use fresh_db_session() - but RBAC session still open)
5. Response sent
6. RBAC session finally closed

Under high load with slow responses (1000-2000ms), sessions accumulate and age.

Scope

Files using get_current_user_with_permissions:
  164 - admin.py
   74 - main.py
   14 - routers/rbac.py
   14 - routers/llm_config_router.py
   14 - routers/llm_admin_router.py
   12 - routers/tokens.py
    9 - routers/observability.py
    8 - routers/sso.py
  ... and more (346 total usages)

Proposed Fix

Option A: Use fresh_db_session() in RBAC (Recommended)

from mcpgateway.db import fresh_db_session

async def get_current_user_with_permissions(
    request: Request,
    credentials: Optional[HTTPAuthorizationCredentials] = Depends(security),
    jwt_token: Optional[str] = Cookie(default=None),
    # REMOVE: db: Session = Depends(get_db)
):
    # ... authentication logic ...
    
    # Don't return db in the dict - callers should use fresh_db_session()
    return {
        "email": user.email,
        "full_name": user.full_name,
        "is_admin": user.is_admin,
        # REMOVE: "db": db,
        ...
    }

Update permission decorators:

# Before:
permission_service = PermissionService(user_context["db"])

# After:
with fresh_db_session() as db:
    permission_service = PermissionService(db)
    granted = await permission_service.check_permission(...)

Option B: Lazy Session Factory

def get_db_factory():
    """Return a context manager that creates fresh sessions on demand."""
    return fresh_db_session

# In get_current_user_with_permissions:
return {
    "get_db": get_db_factory,  # Callers use: with user["get_db"]() as db:
    ...
}

Files to Modify

  1. mcpgateway/middleware/rbac.py:

    • get_permission_service() (line 72)
    • get_current_user_with_permissions() (line 90)
    • require_permission() decorator (line 266)
    • require_admin_permission() decorator (line 411)
    • require_any_permission() decorator (line 493)
    • PermissionChecker class (line 548)
    • Update user_context checks at lines 258, 403, 485 (remove "db" check)
  2. mcpgateway/admin.py:

    • Line 11383: user["db"] if isinstance(user, dict) else db (only place that uses user["db"])

Acceptance Criteria

  • Remove Depends(get_db) from get_current_user_with_permissions
  • Remove "db": db from return dicts
  • Update all permission decorators to use fresh_db_session()
  • Update PermissionChecker class to use fresh_db_session()
  • Update admin.py to not rely on user["db"]
  • Update user_context detection to not require "db" key
  • Add/update tests for RBAC with fresh sessions
  • Load test to verify sessions release immediately

Related Issues

Testing

After fix, under 4000 concurrent users:

  • idle_in_tx should stay < 50 with max age < 5s
  • RPS should remain stable over time (not degrade)
  • No session accumulation pattern

Metadata

Metadata

Assignees

Labels

MUSTP1: Non-negotiable, critical requirements without which the product is non-functional or unsafebugSomething isn't workingdatabaseperformancePerformance related itemspythonPython / backend development (FastAPI)rbacRole-based Access Control

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions