-
Notifications
You must be signed in to change notification settings - Fork 614
[BUG]: RBAC middleware holds database sessions for entire request duration #2340
Copy link
Copy link
Closed
Copy link
Labels
MUSTP1: Non-negotiable, critical requirements without which the product is non-functional or unsafeP1: Non-negotiable, critical requirements without which the product is non-functional or unsafebugSomething isn't workingSomething isn't workingdatabaseperformancePerformance related itemsPerformance related itemspythonPython / backend development (FastAPI)Python / backend development (FastAPI)rbacRole-based Access ControlRole-based Access Control
Milestone
Description
Summary
The RBAC middleware (mcpgateway/middleware/rbac.py) uses Depends(get_db) which holds database sessions open for the entire request duration. Under high load, this causes session accumulation and system degradation even when endpoint handlers use fresh_db_session().
Impact
- Severity: Critical
- Affected: All authenticated endpoints (346 usages across the codebase)
- Symptom: System degrades under load as idle-in-transaction connections accumulate
Evidence from Load Testing
With 4000 concurrent users, 10 workers:
| Time | RPS | idle_in_tx (max age) |
|---|---|---|
| 22:51:43 | 2180 | 54 (3s) |
| 22:52:17 | 462 | 54 (37s) |
| 22:52:51 | 251 | 51 (72s) |
| 22:53:01 | 302 | 51 (81s) |
Stuck queries (100+ seconds old):
SELECT email_teams.* -- RBAC permission check
SELECT servers_1.*, a2a_agents.*
SELECT prompts.*
SELECT resources.*Root Cause
In mcpgateway/middleware/rbac.py:
1. get_current_user_with_permissions (line 90)
async def get_current_user_with_permissions(
request: Request,
credentials: Optional[HTTPAuthorizationCredentials] = Depends(security),
jwt_token: Optional[str] = Cookie(default=None),
db: Session = Depends(get_db) # <-- HOLDS SESSION UNTIL RESPONSE SENT
):
...
return {
"db": db, # <-- PASSED TO DOWNSTREAM
...
}2. get_permission_service (line 72)
async def get_permission_service(db: Session = Depends(get_db)) -> PermissionService:
return PermissionService(db)3. Permission decorators use user_context["db"]
- Line 266:
permission_service = PermissionService(user_context["db"]) - Line 411:
permission_service = PermissionService(user_context["db"]) - Line 493:
permission_service = PermissionService(user_context["db"]) - Line 548:
self.permission_service = PermissionService(user_context["db"])
Request Flow
1. Request arrives
2. RBAC middleware: get_current_user_with_permissions() opens DB session
3. Permission decorators use this session
4. Endpoint handler runs (may use fresh_db_session() - but RBAC session still open)
5. Response sent
6. RBAC session finally closed
Under high load with slow responses (1000-2000ms), sessions accumulate and age.
Scope
Files using get_current_user_with_permissions:
164 - admin.py
74 - main.py
14 - routers/rbac.py
14 - routers/llm_config_router.py
14 - routers/llm_admin_router.py
12 - routers/tokens.py
9 - routers/observability.py
8 - routers/sso.py
... and more (346 total usages)
Proposed Fix
Option A: Use fresh_db_session() in RBAC (Recommended)
from mcpgateway.db import fresh_db_session
async def get_current_user_with_permissions(
request: Request,
credentials: Optional[HTTPAuthorizationCredentials] = Depends(security),
jwt_token: Optional[str] = Cookie(default=None),
# REMOVE: db: Session = Depends(get_db)
):
# ... authentication logic ...
# Don't return db in the dict - callers should use fresh_db_session()
return {
"email": user.email,
"full_name": user.full_name,
"is_admin": user.is_admin,
# REMOVE: "db": db,
...
}Update permission decorators:
# Before:
permission_service = PermissionService(user_context["db"])
# After:
with fresh_db_session() as db:
permission_service = PermissionService(db)
granted = await permission_service.check_permission(...)Option B: Lazy Session Factory
def get_db_factory():
"""Return a context manager that creates fresh sessions on demand."""
return fresh_db_session
# In get_current_user_with_permissions:
return {
"get_db": get_db_factory, # Callers use: with user["get_db"]() as db:
...
}Files to Modify
-
mcpgateway/middleware/rbac.py:get_permission_service()(line 72)get_current_user_with_permissions()(line 90)require_permission()decorator (line 266)require_admin_permission()decorator (line 411)require_any_permission()decorator (line 493)PermissionCheckerclass (line 548)- Update user_context checks at lines 258, 403, 485 (remove "db" check)
-
mcpgateway/admin.py:- Line 11383:
user["db"] if isinstance(user, dict) else db(only place that usesuser["db"])
- Line 11383:
Acceptance Criteria
- Remove
Depends(get_db)fromget_current_user_with_permissions - Remove
"db": dbfrom return dicts - Update all permission decorators to use
fresh_db_session() - Update
PermissionCheckerclass to usefresh_db_session() - Update admin.py to not rely on
user["db"] - Update user_context detection to not require "db" key
- Add/update tests for RBAC with fresh sessions
- Load test to verify sessions release immediately
Related Issues
- [BUG]: Apply fresh_db_session() to remaining 271 endpoints using Depends(get_db) #2334 - Parent issue for endpoint handler session lifetime
- [BUG]: Apply fresh_db_session() to admin.py endpoints (135 usages) #2335 - admin.py session lifetime (135 endpoints)
- [BUG]: Apply fresh_db_session() to remaining 52 REST endpoints in main.py #2336 - main.py REST session lifetime (52 endpoints)
Testing
After fix, under 4000 concurrent users:
- idle_in_tx should stay < 50 with max age < 5s
- RPS should remain stable over time (not degrade)
- No session accumulation pattern
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
MUSTP1: Non-negotiable, critical requirements without which the product is non-functional or unsafeP1: Non-negotiable, critical requirements without which the product is non-functional or unsafebugSomething isn't workingSomething isn't workingdatabaseperformancePerformance related itemsPerformance related itemspythonPython / backend development (FastAPI)Python / backend development (FastAPI)rbacRole-based Access ControlRole-based Access Control