Skip to content

Fix/issue 2340 - eliminate RBAC middleware session accumulation under high load#2549

Merged
crivetimihai merged 1 commit intomainfrom
fix/issue-2340
Feb 2, 2026
Merged

Fix/issue 2340 - eliminate RBAC middleware session accumulation under high load#2549
crivetimihai merged 1 commit intomainfrom
fix/issue-2340

Conversation

@Lang-Akshay
Copy link
Copy Markdown
Collaborator

@Lang-Akshay Lang-Akshay commented Jan 28, 2026

Problem

The RBAC middleware held database sessions open for the entire request duration via Depends(get_db), causing critical performance degradation under high load:

Load test results (4000 concurrent users, 10 workers):

Time RPS idle_in_tx (max age) Notes
22:51:43 2180 54 (3s) System starts healthy
22:52:17 462 54 (37s) RPS drops 79%, sessions aging
22:52:51 251 51 (72s) RPS drops 88%, sessions stuck
22:53:01 302 51 (81s) System severely degraded

Root cause:
get_current_user_with_permissions() opened a session at request start and kept it open until response completion, even though database access was only needed briefly for user lookup. Under high load, sessions accumulated faster than they could be released.

Impact:
All 346 authenticated endpoints were affected.


Solution

Remove long-lived database sessions from RBAC middleware and use fresh_db_session() context manager for short-lived, on-demand database access:

Before:

async def get_current_user_with_permissions(
    request: Request,
    credentials: Optional[HTTPAuthorizationCredentials] = Depends(security),
    jwt_token: Optional[str] = Cookie(default=None),
    db: Session = Depends(get_db)  # ← Held for entire request
):
    user = db.execute(select(EmailUser).where(...))  # Brief DB use
    db.commit()
    db.close()  # Manual cleanup (ineffective with Depends)
    return {"email": user.email, "db": db, ...}  # Passes session downstream

After:

async def get_current_user_with_permissions(
    request: Request,
    credentials: Optional[HTTPAuthorizationCredentials] = Depends(security),
    jwt_token: Optional[str] = Cookie(default=None)
    # ← No db parameter
):
    with fresh_db_session() as db:  # ← Short-lived, auto-cleanup
        user = db.execute(select(EmailUser).where(...))
    return {"email": user.email, ...}  # ← No db in context

Changes Made

  1. mcpgateway/middleware/rbac.py

    • ✅ Removed db: Session = Depends(get_db) parameter from get_current_user_with_permissions()
    • ✅ Replaced with fresh_db_session() context manager for EmailUser lookup
    • ✅ Removed "db": None from all user context return dictionaries
    • ✅ Removed manual db.commit() and db.close() calls (context manager handles this)
    • ✅ Updated all permission decorators (require_permission, require_admin_permission, require_any_permission) to use fresh_db_session() when no endpoint session exists
    • ✅ Updated PermissionChecker class to use fresh_db_session() pattern
    • ✅ Added deprecation warnings to get_db() and get_permission_service() functions
  2. Test Files Updated

    • Updated all test mocks to match new function signature (removed db parameter):
      • tests/unit/mcpgateway/middleware/test_rbac.py
      • tests/unit/mcpgateway/middleware/test_auth_method_propagation.py
      • tests/unit/mcpgateway/utils/test_proxy_auth.py (8 test methods)
      • tests/unit/mcpgateway/test_main.py
      • tests/unit/mcpgateway/test_main_extended.py
      • tests/unit/mcpgateway/test_admin_catalog_htmx.py

Benefits

  1. No session accumulation: Sessions created only when needed, closed immediately after use
  2. Better performance under load: Prevents idle-in-transaction bottleneck
  3. Reduced connection pool pressure: Sessions released 100-1000x faster (milliseconds vs seconds)
  4. Backward compatible: Existing endpoints continue to work; decorators fall back to fresh_db_session() automatically
  5. Cleaner code: Context manager pattern is more explicit and less error-prone than manual cleanup

Expected Impact

After this fix, under 4000 concurrent users:

  • ✅ idle_in_tx count should stay < 50 with max age < 5s
  • ✅ RPS should remain stable over time (no degradation)
  • ✅ No session accumulation pattern
  • ✅ Query execution times stay consistent

Related Issues


Testing

Run the test suite to verify all RBAC tests pass with new session management:

make test

@Lang-Akshay Lang-Akshay marked this pull request as draft January 28, 2026 18:26
@Lang-Akshay Lang-Akshay marked this pull request as ready for review January 28, 2026 18:28
@crivetimihai crivetimihai changed the title Fix/issue 2340 Fix/issue 2340 - Replace Depends(get_db) with fresh_db_session() in RBAC middleware to prevent database session accumulation Jan 28, 2026
@Lang-Akshay Lang-Akshay marked this pull request as draft January 28, 2026 20:38
@Lang-Akshay Lang-Akshay force-pushed the fix/issue-2340 branch 2 times, most recently from 17dc9c3 to 0ae8b60 Compare January 29, 2026 13:33
@Lang-Akshay Lang-Akshay changed the title Fix/issue 2340 - Replace Depends(get_db) with fresh_db_session() in RBAC middleware to prevent database session accumulation Fix/issue 2340 - eliminate RBAC middleware session accumulation under high load Jan 30, 2026
@Lang-Akshay Lang-Akshay reopened this Jan 30, 2026
@Lang-Akshay Lang-Akshay marked this pull request as ready for review January 30, 2026 17:33
@crivetimihai
Copy link
Copy Markdown
Member

Excellent work on this PR, @Lang-Akshay! The analysis is thorough - the load test data clearly demonstrates the session accumulation problem, and the fix using fresh_db_session() context manager is the right approach.

The code looks good:

  • Clean replacement of Depends(get_db) with short-lived context manager sessions
  • Proper deprecation warnings on the old patterns
  • Comprehensive test updates across 6 test files
  • Good simplification in db.py by reusing get_db as a context manager

One minor observation: The new fresh_db_session = contextmanager(get_db) line replaces a more explicit implementation. This is cleaner, but worth noting that it now inherits get_db's commit-on-success behavior (which is correct).

To merge: The DCO check is failing. Please sign off your commits:

git rebase HEAD~N --signoff  # where N is number of commits
# or for a single commit:
git commit --amend --signoff
git push --force-with-lease

Once that's done, this should be ready to merge. Great contribution!

@crivetimihai crivetimihai added this to the Release 1.0.0-RC1 milestone Feb 1, 2026
@crivetimihai crivetimihai self-assigned this Feb 1, 2026
@crivetimihai crivetimihai added the performance Performance related items label Feb 1, 2026
Replace long-lived database sessions in RBAC middleware with
fresh_db_session() context manager to prevent session accumulation
under high concurrent load.

Changes:
- Remove db parameter from get_current_user_with_permissions()
- Use fresh_db_session() context manager for short-lived DB access
- Keep "db": None in user context for backward compatibility
- Add deprecation warnings to get_db() and get_permission_service()
- Update all permission decorators to use fresh_db_session() fallback
- Update PermissionChecker to use fresh_db_session() pattern
- Simplify db.py by reusing get_db() generator for fresh_db_session

Security fixes:
- Use named kwargs (user, _user, current_user, current_user_ctx) for
  user context extraction instead of scanning all dicts for "email"
  to prevent request body injection attacks

Performance fixes:
- PermissionChecker.has_any_permission now uses single session for
  all permission checks instead of opening N sessions

This prevents idle-in-transaction bottlenecks where sessions were
held for entire request duration instead of milliseconds.

Closes #2340

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
@crivetimihai
Copy link
Copy Markdown
Member

Review Changes

Rebased onto main and applied the following fixes during code review:

Security Fix

  • User context extraction hardened: Changed from scanning all kwargs for any dict with "email" key to explicitly checking named parameters (user, _user, current_user, current_user_ctx). This prevents potential request body injection attacks where an attacker could craft a payload like {"email": "admin@..."} to bypass permission checks.

Backward Compatibility Fix

  • Restored "db": None in user context: The original PR removed the "db" key entirely, which would cause KeyError in endpoints accessing current_user_ctx["db"]. Restored "db": None to maintain the same behavior as main (returns None instead of raising KeyError).

Performance Fix

  • Single session for has_any_permission: PermissionChecker.has_any_permission() was opening N fresh DB sessions for N permission checks. Now uses a single session for all checks.

Test Updates

  • Updated tests/unit/mcpgateway/test_main_error_handlers.py mock signature
  • Updated tests/utils/rbac_mocks.py to return "db": None instead of MagicMock()

Pre-existing Issue Identified

During review, identified that 22 endpoints across teams.py, email_auth.py, and llm_config_router.py access current_user_ctx["db"] which has been None since commit 93102499e (2026-01-27). This is a pre-existing bug on main, not introduced by this PR.

Tracked separately in: #2641

@crivetimihai crivetimihai merged commit de9c75c into main Feb 2, 2026
51 checks passed
@crivetimihai crivetimihai deleted the fix/issue-2340 branch February 2, 2026 00:14
hughhennelly pushed a commit to hughhennelly/mcp-context-forge that referenced this pull request Feb 8, 2026
…BM#2549)

Replace long-lived database sessions in RBAC middleware with
fresh_db_session() context manager to prevent session accumulation
under high concurrent load.

Changes:
- Remove db parameter from get_current_user_with_permissions()
- Use fresh_db_session() context manager for short-lived DB access
- Keep "db": None in user context for backward compatibility
- Add deprecation warnings to get_db() and get_permission_service()
- Update all permission decorators to use fresh_db_session() fallback
- Update PermissionChecker to use fresh_db_session() pattern
- Simplify db.py by reusing get_db() generator for fresh_db_session

Security fixes:
- Use named kwargs (user, _user, current_user, current_user_ctx) for
  user context extraction instead of scanning all dicts for "email"
  to prevent request body injection attacks

Performance fixes:
- PermissionChecker.has_any_permission now uses single session for
  all permission checks instead of opening N sessions

This prevents idle-in-transaction bottlenecks where sessions were
held for entire request duration instead of milliseconds.

Closes IBM#2340

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: hughhennnelly <hughhennelly06@gmail.com>
kcostell06 pushed a commit to kcostell06/mcp-context-forge that referenced this pull request Feb 24, 2026
…BM#2549)

Replace long-lived database sessions in RBAC middleware with
fresh_db_session() context manager to prevent session accumulation
under high concurrent load.

Changes:
- Remove db parameter from get_current_user_with_permissions()
- Use fresh_db_session() context manager for short-lived DB access
- Keep "db": None in user context for backward compatibility
- Add deprecation warnings to get_db() and get_permission_service()
- Update all permission decorators to use fresh_db_session() fallback
- Update PermissionChecker to use fresh_db_session() pattern
- Simplify db.py by reusing get_db() generator for fresh_db_session

Security fixes:
- Use named kwargs (user, _user, current_user, current_user_ctx) for
  user context extraction instead of scanning all dicts for "email"
  to prevent request body injection attacks

Performance fixes:
- PermissionChecker.has_any_permission now uses single session for
  all permission checks instead of opening N sessions

This prevents idle-in-transaction bottlenecks where sessions were
held for entire request duration instead of milliseconds.

Closes IBM#2340

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance related items

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: RBAC middleware holds database sessions for entire request duration

2 participants