-
Notifications
You must be signed in to change notification settings - Fork 614
[BUG]: Apply fresh_db_session() to remaining 271 endpoints using Depends(get_db) #2334
Copy link
Copy link
Open
Open
Copy link
Labels
SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releaseP2: Important but not vital; high-value items that are not crucial for the immediate releasebugSomething isn't workingSomething isn't workingdatabasepythonPython / backend development (FastAPI)Python / backend development (FastAPI)
Milestone
Description
Summary
Load testing with 4000 Locust users reveals that 271 endpoints still use Depends(get_db) which holds database sessions for the entire request lifecycle. This causes:
- OOM kills - Gateway workers exceed 8GB container limit under load
- Idle-in-transaction buildup - 120+ connections stuck for 60-290 seconds
- Memory pressure - Held sessions + pending responses consume RAM
Evidence
From profiling RCA (todo/claude-rca-2026-01-22-profiling.md):
Memory cgroup out of memory: Killed process 1172024 (mcpgateway work)
total-vm:7470296kB, anon-rss:1102856kB
Stuck queries in pg_stat_activity:
- `SELECT tools.*` - stuck 64-221 seconds
- `SELECT email_teams.*` - stuck 69-215 seconds
Endpoints to Fix (Priority Order)
High Priority (main.py - 52 endpoints)
```bash
grep -n "Depends(get_db)" mcpgateway/main.py | wc -l
52 endpoints
```
Key endpoints:
- `/tools`, `/tools/{id}` - CRUD operations
- `/servers`, `/servers/{id}` - CRUD operations
- `/resources`, `/resources/{id}` - CRUD operations
- `/prompts`, `/prompts/{id}` - CRUD operations
- `/health`, `/ready` - Health checks
Medium Priority (admin.py)
- `/admin/` - Heavy HTML template rendering (5-7s response times)
- `/admin/tools/partial` - HTMX partials
- `/admin/resources/partial` - HTMX partials
- `/admin/prompts/partial` - HTMX partials
Lower Priority (routers/)
- `mcpgateway/routers/tokens.py` - 10 endpoints
- `mcpgateway/routers/sso.py` - 9 endpoints
- `mcpgateway/routers/oauth_router.py` - 7 endpoints
- Other routers
Pattern to Apply
Replace:
```python
@router.get("/tools")
async def list_tools(db: Session = Depends(get_db)):
tools = tool_service.list_tools(db)
return tools
```
With:
```python
@router.get("/tools")
async def list_tools():
with fresh_db_session() as db:
tools = tool_service.list_tools(db)
return [t.model_dump() for t in tools] # Serialize inside context
```
Acceptance Criteria
- All 271 `Depends(get_db)` usages replaced with `fresh_db_session()`
- Unit tests pass
- Load test (4000 users) runs for 10+ minutes without OOM kills
- No `idle in transaction` connections older than 60 seconds
Related Issues
- [BUG][PERFORMANCE][DB]: Endpoint handlers hold DB sessions during slow MCP backend calls #2323 - Database session lifetime performance issues
- [BUG][PERFORMANCE]: TokenScopingMiddleware causes connection pool exhaustion under load #2330 - TokenScopingMiddleware connection pool exhaustion
References
- `todo/claude-rca-2026-01-22-profiling.md` - Full profiling report
- `todo/rca-part-2.md` - Initial RCA
- `docs/docs/development/profiling.md` - Profiling guide
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releaseP2: Important but not vital; high-value items that are not crucial for the immediate releasebugSomething isn't workingSomething isn't workingdatabasepythonPython / backend development (FastAPI)Python / backend development (FastAPI)