-
Notifications
You must be signed in to change notification settings - Fork 614
[BUG]: Apply fresh_db_session() to admin.py endpoints (135 usages) #2335
Description
Summary
The admin UI (mcpgateway/admin.py) holds database sessions for entire request lifecycles via Depends(get_db), causing:
- 245+ idle-in-transaction connections during load testing
- OOM kills when memory pressure from held sessions exceeds container limits
idle transaction timeouterrors after 300 seconds- Cascading failures that crash the gateway within ~2 minutes under load
Root Cause
The Problem with Depends(get_db)
FastAPI's dependency injection holds sessions until after the response is fully sent:
Request arrives
→ Depends(get_db) creates session
→ Endpoint runs queries
→ Template rendering begins (100ms - 5s)
→ Response streaming to client
→ ONLY NOW: dependency cleanup runs, session closes
For admin UI endpoints that render HTML templates, sessions are held for:
- Query execution time
- Template rendering time (~100ms for simple, 1-5s for complex)
- Response transmission time (depends on client speed)
The Scale of the Problem
| Metric | Value |
|---|---|
Total Depends(get_db) usages |
135 |
| Admin.py file size | 18,299 lines |
| GET endpoints | 116 |
| POST endpoints | 62 |
| HTMX partial endpoints | ~20 |
Endpoints with db.commit() workaround |
47 |
High-Risk Endpoints (HTMX Partials)
These are called repeatedly for pagination, infinite scroll, and UI updates:
| Line | Function | db.commit() workaround? |
|---|---|---|
| 836 | get_overview_partial |
❌ No |
| 1531 | admin_servers_partial_html |
✅ Yes |
| 2908 | admin_ui (main page) |
❌ No |
| 4302 | admin_teams_partial_html |
❌ No |
| 6224 | admin_users_partial_html |
✅ Yes |
| 7150 | admin_tools_partial_html |
✅ Yes |
| 7643 | admin_prompts_partial_html |
✅ Yes |
| 7836 | admin_gateways_partial_html |
✅ Yes |
| 8308 | admin_resources_partial_html |
✅ Yes |
| 8887 | admin_a2a_partial_html |
✅ Yes |
| 12297 | admin_metrics_partial_html |
❌ No |
Why db.commit() Workaround Isn't Enough
Some endpoints call db.commit() before template rendering:
# End the read-only transaction before template rendering
db.commit()
return templates.TemplateResponse(...)This releases the PostgreSQL transaction but the SQLAlchemy Session object remains in memory until Depends(get_db)'s cleanup runs after the response is sent. Under load:
- Session objects consume memory
- Connection pool slots remain checked out
- PgBouncer can't reassign the backend connection
Evidence from Load Testing
Connection Buildup
state | count | max_age_s
--------------------+-------+-----------
idle in transaction | 245 | 40
idle | 56 | 1
active | 6 | 3
Stuck Queries (All from admin endpoints)
SELECT email_teams.* FROM email_teams WHERE is_active ORDER BY name -- 87 connections
SELECT servers_1.*, tools.* ... -- 19 connections
SELECT servers_1.*, a2a_agents.* ... -- 18 connectionsOOM Kills (from dmesg)
Memory cgroup out of memory: Killed process 1171829 (mcpgateway work)
total-vm:6606356kB, anon-rss:729012kB
Proposed Fix
Pattern: Replace Depends(get_db) with fresh_db_session()
Before:
@admin_router.get("/overview/partial")
async def get_overview_partial(
request: Request,
db: Session = Depends(get_db), # Held until response sent
user=Depends(get_current_user_with_permissions),
) -> HTMLResponse:
servers_total = db.query(func.count(DbServer.id)).scalar()
tools_total = db.query(func.count(DbTool.id)).scalar()
# ... more queries ...
return templates.TemplateResponse("overview.html", context)After:
@admin_router.get("/overview/partial")
async def get_overview_partial(
request: Request,
user=Depends(get_current_user_with_permissions),
) -> HTMLResponse:
with fresh_db_session() as db: # Released immediately after block
servers_total = db.query(func.count(DbServer.id)).scalar()
tools_total = db.query(func.count(DbTool.id)).scalar()
# ... more queries ...
context = {"servers_total": servers_total, ...}
# Session is now CLOSED
return templates.TemplateResponse("overview.html", context)Migration Priority
-
Phase 1 (Critical): HTMX partial endpoints - called most frequently
get_overview_partialadmin_*_partial_html(12 endpoints)
-
Phase 2 (High): Main pages and CRUD operations
admin_ui(main dashboard)admin_add_*,admin_delete_*,admin_update_*
-
Phase 3 (Medium): State changes and utility endpoints
admin_set_*_state- Import/export endpoints
-
Phase 4 (Low): Remaining endpoints
- Login, password change
- Plugin management
Considerations
-
Services that take
dbparameter: Most services likeTeamManagementService(db)store the session. Pass fresh session each time:with fresh_db_session() as db: team_service = TeamManagementService(db) teams = await team_service.list_teams(...)
-
Multiple service calls: Group all DB work in one
withblock when possible:with fresh_db_session() as db: servers = await server_service.list_servers(db, ...) tools = await tool_service.list_tools(db, ...) context = {"servers": servers, "tools": tools} return templates.TemplateResponse("admin.html", context)
-
Write operations: Ensure proper commit/rollback handling (already done by
fresh_db_session)
Related Issues
- [BUG][PERFORMANCE][DB]: Endpoint handlers hold DB sessions during slow MCP backend calls #2323 - Session lifetime issues causing idle transaction timeouts
- [BUG]: Apply fresh_db_session() to remaining 271 endpoints using Depends(get_db) #2334 - Apply fresh_db_session() to remaining 271 endpoints (includes admin.py)
Acceptance Criteria
- All 135
Depends(get_db)usages in admin.py replaced withfresh_db_session() - No
idle in transactionconnections during admin UI usage - No OOM kills during load testing with admin UI traffic
- All existing admin UI tests pass
- Load test shows stable memory under admin traffic (target: <3GB per gateway)