-
Notifications
You must be signed in to change notification settings - Fork 614
[BUG]: Apply fresh_db_session() to remaining 52 REST endpoints in main.py #2336
Copy link
Copy link
Open
Labels
SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releaseP2: Important but not vital; high-value items that are not crucial for the immediate releasebugSomething isn't workingSomething isn't workingdatabasepythonPython / backend development (FastAPI)Python / backend development (FastAPI)
Milestone
Description
Summary
The load test crashes even without admin traffic because 52 REST endpoints in main.py still use Depends(get_db) which holds database sessions for entire request lifecycles.
Root Cause Analysis
Session Pattern Status
| Pattern | Count | Status |
|---|---|---|
with fresh_db_session() as db: |
22 | ✅ Fixed |
db: Session = Depends(get_db) |
52 | ❌ Broken |
Evidence from Load Test
Without admin traffic (4000 users, ~300 RPS):
Connection States:
idle in transaction | 227 | max_age: 219s
Stuck Queries (approaching 300s timeout):
SELECT servers_1.*, a2a_agents.* | 114 connections
SELECT email_teams.* | 38 connections
SELECT resources.* | 36 connections
SELECT prompts.* | 24 connections
These queries come from:
/servers/{id}/tools- usesDepends(get_db)/tools/{id}- usesDepends(get_db)- Team membership checks in middleware
Broken Endpoints (52 total)
Get Single Item (6 endpoints)
get_server(server_id)→/servers/{server_id}get_tool(tool_id)→/tools/{tool_id}get_resource_info(resource_id)→/resources/{resource_id}get_prompt_no_args(prompt_id)→/prompts/{prompt_id}get_gateway(gateway_id)→/gateways/{gateway_id}get_a2a_agent(agent_id)→/a2a/{agent_id}
Server Sub-Resources (3 endpoints) - HIGH PRIORITY
These are called by Locust with high frequency:
server_get_tools(server_id)→/servers/{server_id}/toolsserver_get_resources(server_id)→/servers/{server_id}/resourcesserver_get_prompts(server_id)→/servers/{server_id}/prompts
State Change Operations (12 endpoints)
set_server_state,toggle_server_statusset_tool_state,toggle_tool_statusset_resource_state,toggle_resource_statusset_prompt_state,toggle_prompt_statusset_gateway_state,toggle_gateway_statusset_a2a_agent_state,toggle_a2a_agent_status
CRUD Operations (12+ endpoints)
create_server,update_server,delete_servercreate_tool,update_tool,delete_toolcreate_resource,delete_resourcecreate_prompt,delete_promptregister_gateway,delete_gatewaycreate_a2a_agent,update_a2a_agent,delete_a2a_agent,invoke_a2a_agent
Protocol Handlers (2 endpoints)
handle_completion→/completion/completehandle_sampling→/sampling/createMessage
Other Endpoints
read_resource→/resources/{resource_id}/readlist_resource_templatesget_metrics,reset_metrics- Export/import endpoints
Why This Causes Crashes
- High-frequency endpoints like
/servers/{id}/toolsuseDepends(get_db) - Sessions held during response serialization and network transmission
- 227+ sessions accumulate as
idle in transaction - After 219 seconds, sessions approach PostgreSQL's 300s
idle_in_transaction_session_timeout - Memory builds up from held sessions → OOM kills
Proposed Fix
Apply the same pattern used for list endpoints:
Before:
@server_router.get("/{server_id}/tools")
async def server_get_tools(
server_id: str,
db: Session = Depends(get_db), # Held until response sent
user=Depends(get_current_user_with_permissions),
):
tools = await tool_service.list_server_tools(db, server_id)
return [tool.model_dump(by_alias=True) for tool in tools]After:
@server_router.get("/{server_id}/tools")
async def server_get_tools(
server_id: str,
user=Depends(get_current_user_with_permissions),
):
with fresh_db_session() as db: # Released immediately after block
tools = await tool_service.list_server_tools(db, server_id)
result = [tool.model_dump(by_alias=True) for tool in tools]
return result # Response sent after session closedPriority Order
- Critical: Server sub-resources (
/servers/{id}/tools, etc.) - highest load - High: Get single item endpoints (
/tools/{id},/servers/{id}, etc.) - Medium: Protocol handlers, state changes
- Lower: CRUD operations (less frequent)
Related Issues
- [BUG][PERFORMANCE][DB]: Endpoint handlers hold DB sessions during slow MCP backend calls #2323 - Session lifetime issues causing idle transaction timeouts
- [BUG]: Apply fresh_db_session() to remaining 271 endpoints using Depends(get_db) #2334 - Apply fresh_db_session() to remaining 271 endpoints (parent issue)
- [BUG]: Apply fresh_db_session() to admin.py endpoints (135 usages) #2335 - Apply fresh_db_session() to admin.py endpoints (135 usages)
Acceptance Criteria
- All 52
Depends(get_db)usages in main.py converted tofresh_db_session() - No
idle in transactionbuildup during load testing without admin traffic - Load test runs stable for 10+ minutes with 4000 users
- All existing tests pass
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releaseP2: Important but not vital; high-value items that are not crucial for the immediate releasebugSomething isn't workingSomething isn't workingdatabasepythonPython / backend development (FastAPI)Python / backend development (FastAPI)