Skip to content

[BUG][PERFORMANCE][DB]: Endpoint handlers hold DB sessions during slow MCP backend calls #2323

@crivetimihai

Description

@crivetimihai

Summary

Endpoint handlers in mcpgateway/routers/ acquire database sessions via Depends(get_db) at request start, fetch data, then hold the session while making slow MCP backend calls (1-30+ seconds). This causes "idle in transaction" connections to accumulate, eventually hitting PostgreSQL's idle_in_transaction_session_timeout and causing ProtocolViolation errors.

Related

This is a follow-up to #2318 which fixed the same pattern in RBAC middleware. That fix is working - no permission queries appear in long-running transactions anymore. However, endpoint handlers still exhibit the same anti-pattern.

Evidence from Load Testing (4000 users)

After RBAC fix (#2319), remaining long-running transactions are all from endpoint handlers:

  source   | count | avg_age | max_age 
-----------+-------+---------+---------
 teams     |    20 |      41 |      52s
 tools     |    10 |      36 |      50s
 resources |     8 |      43 |      53s
 prompts   |     7 |      45 |      52s
 servers   |     5 |      42 |      52s

Example queries held during slow MCP calls:

SELECT resources.id, resources.uri, resources.name, ... -- held 63s
SELECT email_teams.id, email_teams.name, ...            -- held 62s
SELECT prompts.id, prompts.original_name, ...           -- held 62s
SELECT tools.id, tools.original_name, ...               -- held 60s

Root Cause

Endpoint pattern that holds sessions:

# Current pattern in routers/*.py
@router.get("/tools/{tool_id}/call")
async def call_tool(
    tool_id: str,
    db: Session = Depends(get_db),  # Session acquired here
    user: dict = Depends(get_current_user_with_permissions),
):
    tool = db.query(Tool).filter(Tool.id == tool_id).first()  # Quick query
    
    # Session still held during slow MCP call (1-30+ seconds)
    result = await mcp_client.call_tool(tool.gateway_url, tool.name, arguments)
    
    return result  # Session finally released

Proposed Solution

Apply same pattern as RBAC fix - use fresh_db_session() for data fetching, close before slow operations:

@router.get("/tools/{tool_id}/call")
async def call_tool(
    tool_id: str,
    user: dict = Depends(get_current_user_with_permissions),
):
    # Fetch data with short-lived session
    with fresh_db_session() as db:
        tool = db.query(Tool).filter(Tool.id == tool_id).first()
        if not tool:
            raise HTTPException(404, "Tool not found")
        # Extract needed data before session closes
        gateway_url = tool.gateway_url
        tool_name = tool.name
    # Session closed here - BEFORE slow MCP call
    
    # Now make slow MCP call without holding DB connection
    result = await mcp_client.call_tool(gateway_url, tool_name, arguments)
    
    return result

Files to Modify

File Endpoints
mcpgateway/routers/tools.py call_tool, list_tools, tool operations
mcpgateway/routers/prompts.py get_prompt, list_prompts, prompt operations
mcpgateway/routers/resources.py read_resource, list_resources, resource operations
mcpgateway/routers/teams.py Team CRUD operations
mcpgateway/routers/servers.py Server operations with MCP calls
mcpgateway/routers/gateways.py Gateway operations with backend calls

Expected Impact

Metric Current (after RBAC fix) Expected (after this fix)
Idle-in-txn count 50 < 20
Max transaction age 50-100s < 5s
ProtocolViolation errors Still occurring None
Stability at 4000 users Degrades over time Stable

Acceptance Criteria

  • No endpoint handlers hold sessions during MCP backend calls
  • Max transaction age stays < 10s under load
  • No ProtocolViolation errors during 15-min load test at 4000 users
  • All existing tests pass

References

Metadata

Metadata

Assignees

Labels

MUSTP1: Non-negotiable, critical requirements without which the product is non-functional or unsafebugSomething isn't workingdatabaseperformancePerformance related itemspythonPython / backend development (FastAPI)readyValidated, ready-to-work-on items

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions