Skip to content

[PERFORMANCE]: Fix PostgreSQL 'Idle in Transaction' Connection Issue #1862

@crivetimihai

Description

@crivetimihai

Summary

Under load, 50-60+ database connections are stuck in "idle in transaction" state, holding transactions open unnecessarily. This wastes connection pool capacity and can cause lock contention.

Problem

Observed during high-concurrency load testing:

SELECT state, count(*) FROM pg_stat_activity WHERE datname = 'mcp' GROUP BY state;

        state        | count
---------------------+-------
 idle in transaction |    62   -- ⚠️ Holding transactions open
 idle                |    44
 active              |     4

Wait event: ClientRead - Waiting for Python to send query or close transaction

Affected queries (all SELECTs):

  • SELECT email_team_members...
  • SELECT a2a_agents...
  • SELECT gateways...
  • SELECT prompts...

Root Cause

SQLAlchemy is configured with autocommit=False:

# mcpgateway/db.py:415
SessionLocal = sessionmaker(
    autocommit=False,      # ← Every query starts a transaction
    autoflush=False,
    expire_on_commit=False,
    bind=engine
)

When a request handler executes a SELECT query, an implicit BEGIN starts. The transaction stays open until the request completes.

Impact

Issue Impact
Connection pool waste 62 of 100 connections tied up
Lock contention Long transactions hold row locks
Replication lag Open transactions prevent WAL cleanup

Proposed Solutions

Option 1: Use AUTOCOMMIT for Read-Only Operations (Recommended)

read_engine = create_engine(
    settings.database_url,
    isolation_level="AUTOCOMMIT",  # No transactions for SELECTs
)

def get_read_db() -> Generator[Session, Any, None]:
    """Read-only session with AUTOCOMMIT."""
    db = ReadOnlySessionLocal()
    try:
        yield db
    finally:
        db.close()

Option 2: Set PostgreSQL idle_in_transaction_session_timeout

postgres:
  command: >
    postgres
    -c idle_in_transaction_session_timeout=30000

Recommended Implementation

  1. Immediate: Add idle_in_transaction_session_timeout=30000 to PostgreSQL
  2. Short-term: Create get_read_db() dependency for read-only endpoints
  3. Audit: Migrate read-only endpoints to use get_read_db()

Acceptance Criteria

  • idle in transaction connections reduced from 60+ to <5
  • No increase in query errors
  • PostgreSQL idle_in_transaction_session_timeout configured
  • get_read_db() dependency created for read-only operations
  • Major read endpoints migrated to use get_read_db()

Files to Modify

File Change
docker-compose.yml Add PostgreSQL timeout setting
mcpgateway/db.py Add read-only session factory
mcpgateway/routers/*.py Migrate GET endpoints to get_read_db()

Performance Impact

Metric Before After (Expected)
Idle in transaction 60+ <5
Available connections ~40 ~95
Query throughput Limited +20-30%

Related

Metadata

Metadata

Assignees

Labels

SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releasedatabaseperformancePerformance related items

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions