Skip to content

[PERFORMANCE]: N+1 query pattern in EmailTeam.get_member_count() #1892

@crivetimihai

Description

@crivetimihai

Problem Summary

The get_member_count() method on EmailTeam executes a separate COUNT query for each team. When listing teams, this creates an N+1 query pattern causing:

  • 11,429+ individual COUNT queries during load tests
  • 60+ idle-in-transaction connections under load
  • 7.8 second average response time for /admin/ endpoint

Current Implementation

Location: mcpgateway/db.py:1453-1475

def get_member_count(self) -> int:
    """Get the current number of team members."""
    from sqlalchemy.orm import object_session

    session = object_session(self)
    if session is None:
        return len([m for m in self.members if m.is_active])

    count = session.query(func.count(EmailTeamMember.id)).filter(
        EmailTeamMember.team_id == self.id,
        EmailTeamMember.is_active.is_(True)
    ).scalar()
    return count or 0

Generated SQL (per team):

SELECT count(email_team_members.id) AS count_1
FROM email_team_members
WHERE email_team_members.team_id = :team_id
  AND email_team_members.is_active = true

Call Sites Analysis

N+1 Problem Locations (requires batching)

File Line Function Loop Context
mcpgateway/admin.py 2454 admin_ui() for team in raw_teams
mcpgateway/admin.py 3318 _render_unified_teams_view() for team in user_teams
mcpgateway/admin.py 3326 _render_unified_teams_view() for team in public_teams
mcpgateway/admin.py 3510 admin_list_teams() for team in teams
mcpgateway/routers/teams.py 162 list_teams() for team in teams
mcpgateway/routers/teams.py 678 discover_teams() loop over joinable teams

Single-Call Locations (OK as-is)

File Line Function Context
mcpgateway/routers/teams.py 99 create_team() Single team after creation
mcpgateway/routers/teams.py 214 get_team() Single team by ID
mcpgateway/routers/teams.py 266 update_team() Single team after update

Existing Indexes

From migration n8h9i0j1k2l3_add_database_indexes.py:

Index Name Columns Notes
idx_email_team_members_team_id team_id Basic FK index
idx_email_team_members_user_email user_email Basic FK index
idx_email_team_members_user_team_active user_email, team_id, is_active Composite
idx_email_team_members_team_role_active team_id, role, is_active Composite (includes role for role-based queries)

Gap: No partial index optimized for the count query pattern WHERE is_active = true.


Proposed Solution

Phase 1: Add Batch Query Helpers

Add three new methods to TeamManagementService that batch-fetch data for multiple teams in single queries.

File: mcpgateway/services/team_management_service.py

Required import additions:

# Add to existing imports at top of file
from typing import Any, Dict, List, Optional, Tuple  # Add Dict, Any
from sqlalchemy import func, select  # Add func

1.1 Batch Member Counts

def get_member_counts_batch(self, team_ids: List[str]) -> Dict[str, int]:
    """Get member counts for multiple teams in a single query.

    This is a synchronous method following the existing service pattern.
    Note: Like other sync SQLAlchemy calls, this will block the event
    loop in async contexts. For typical team counts this is acceptable.

    Args:
        team_ids: List of team UUIDs

    Returns:
        Dict mapping team_id to member count

    Raises:
        Exception: Re-raises any database errors after rollback

    Examples:
        >>> from unittest.mock import Mock
        >>> service = TeamManagementService(Mock())
        >>> service.get_member_counts_batch([])
        {}
    """
    if not team_ids:
        return {}

    try:
        # Single query for all teams
        results = self.db.query(
            EmailTeamMember.team_id,
            func.count(EmailTeamMember.id).label("count")
        ).filter(
            EmailTeamMember.team_id.in_(team_ids),
            EmailTeamMember.is_active.is_(True)
        ).group_by(EmailTeamMember.team_id).all()

        self.db.commit()  # Release transaction to avoid idle-in-transaction

        # Build result dict, defaulting to 0 for teams with no members
        counts = {str(row.team_id): row.count for row in results}
        return {tid: counts.get(tid, 0) for tid in team_ids}
    except Exception as e:
        self.db.rollback()
        logger.error(f"Failed to get member counts for teams: {e}")
        raise

Generated SQL (single query for N teams):

SELECT email_team_members.team_id, count(email_team_members.id) AS count
FROM email_team_members
WHERE email_team_members.team_id IN (:id_1, :id_2, ..., :id_n)
  AND email_team_members.is_active = true
GROUP BY email_team_members.team_id

1.2 Batch User Roles

def get_user_roles_batch(self, user_email: str, team_ids: List[str]) -> Dict[str, Optional[str]]:
    """Get a user's role in multiple teams in a single query.

    Args:
        user_email: Email of the user
        team_ids: List of team UUIDs

    Returns:
        Dict mapping team_id to role (or None if not a member)

    Raises:
        Exception: Re-raises any database errors after rollback
    """
    if not team_ids:
        return {}

    try:
        # Single query for all teams
        results = self.db.query(
            EmailTeamMember.team_id,
            EmailTeamMember.role
        ).filter(
            EmailTeamMember.user_email == user_email,
            EmailTeamMember.team_id.in_(team_ids),
            EmailTeamMember.is_active.is_(True)
        ).all()

        self.db.commit()  # Release transaction to avoid idle-in-transaction

        # Build result dict - teams with no membership return None
        roles = {str(row.team_id): row.role for row in results}
        return {tid: roles.get(tid) for tid in team_ids}
    except Exception as e:
        self.db.rollback()
        logger.error(f"Failed to get user roles for {user_email}: {e}")
        raise

Generated SQL:

SELECT email_team_members.team_id, email_team_members.role
FROM email_team_members
WHERE email_team_members.user_email = :user_email
  AND email_team_members.team_id IN (:id_1, :id_2, ..., :id_n)
  AND email_team_members.is_active = true

1.3 Batch Pending Join Requests

def get_pending_join_requests_batch(self, user_email: str, team_ids: List[str]) -> Dict[str, Optional[Any]]:
    """Get pending join requests for a user across multiple teams in a single query.

    Args:
        user_email: Email of the user
        team_ids: List of team UUIDs to check

    Returns:
        Dict mapping team_id to pending EmailTeamJoinRequest (or None if no pending request)

    Raises:
        Exception: Re-raises any database errors after rollback
    """
    if not team_ids:
        return {}

    try:
        # Single query for all pending requests across teams
        results = self.db.query(EmailTeamJoinRequest).filter(
            EmailTeamJoinRequest.user_email == user_email,
            EmailTeamJoinRequest.team_id.in_(team_ids),
            EmailTeamJoinRequest.status == "pending"
        ).all()

        self.db.commit()  # Release transaction to avoid idle-in-transaction

        # Build result dict - only one pending request per team expected
        requests = {str(req.team_id): req for req in results}
        return {tid: requests.get(tid) for tid in team_ids}
    except Exception as e:
        self.db.rollback()
        logger.error(f"Failed to get pending join requests for {user_email}: {e}")
        raise

Generated SQL:

SELECT * FROM email_team_join_requests
WHERE user_email = :user_email
  AND team_id IN (:id_1, :id_2, ..., :id_n)
  AND status = 'pending'

Phase 2: Update Call Sites

2.1 Update admin.py:admin_ui() (lines 2443-2460)

Before:

raw_teams = await team_service.get_user_teams(user_email)
user_teams = []
for team in raw_teams:
    user_role = await team_service.get_user_role_in_team(user_email, team.id)
    team_dict = {
        "id": str(team.id),
        "name": str(team.name),
        "member_count": team.get_member_count(),  # N+1 HERE
        "role": user_role or "member",
    }
    user_teams.append(team_dict)

After:

raw_teams = await team_service.get_user_teams(user_email)

# Batch fetch all data in 2 queries instead of 2N queries
team_ids = [str(team.id) for team in raw_teams]
member_counts = await team_service.get_member_counts_batch_cached(team_ids)
user_roles = team_service.get_user_roles_batch(user_email, team_ids)

user_teams = []
for team in raw_teams:
    try:
        team_id = str(team.id)
        team_dict = {
            "id": team_id,
            "name": str(team.name) if team.name else "",
            "type": str(getattr(team, "type", "organization")),
            "is_personal": bool(getattr(team, "is_personal", False)),
            "member_count": member_counts.get(team_id, 0),
            "role": user_roles.get(team_id) or "member",
        }
        user_teams.append(team_dict)
    except Exception as team_error:
        LOGGER.warning(f"Failed to serialize team {getattr(team, 'id', 'unknown')}: {team_error}")
        continue

2.2 Update admin.py:_render_unified_teams_view() (lines 3305-3327)

Before:

user_teams = await team_service.get_user_teams(current_user.email)
public_teams = await team_service.discover_public_teams(current_user.email)

all_teams = []
for team in user_teams:
    user_role = await team_service.get_user_role_in_team(current_user.email, team.id)
    all_teams.append({
        "team": team,
        "relationship": "owner" if user_role == "owner" else "member",
        "member_count": team.get_member_count()  # N+1 HERE
    })

for team in public_teams:
    all_teams.append({
        "team": team,
        "member_count": team.get_member_count()  # N+1 HERE
    })

After:

user_teams = await team_service.get_user_teams(current_user.email)
public_teams = await team_service.discover_public_teams(current_user.email)

# Batch fetch ALL data upfront - 3 queries instead of 3N queries
user_team_ids = [str(t.id) for t in user_teams]
public_team_ids = [str(t.id) for t in public_teams]
all_team_ids = user_team_ids + public_team_ids

member_counts = await team_service.get_member_counts_batch_cached(all_team_ids)
user_roles = team_service.get_user_roles_batch(current_user.email, user_team_ids)
pending_requests = team_service.get_pending_join_requests_batch(current_user.email, public_team_ids)

all_teams = []

# Add user's teams (owned and member)
for team in user_teams:
    team_id = str(team.id)
    user_role = user_roles.get(team_id)
    relationship = "owner" if user_role == "owner" else "member"
    all_teams.append({
        "team": team,
        "relationship": relationship,
        "member_count": member_counts.get(team_id, 0)
    })

# Add public teams user can join
for team in public_teams:
    team_id = str(team.id)
    all_teams.append({
        "team": team,
        "relationship": "join",
        "member_count": member_counts.get(team_id, 0),
        "pending_request": pending_requests.get(team_id)  # None if no pending request
    })

2.3 Update admin.py:admin_list_teams() (lines 3503-3542)

Before:

if current_user.is_admin:
    teams, _ = await team_service.list_teams()
else:
    teams = await team_service.get_user_teams(current_user.email)

teams_html = ""
for team in teams:
    member_count = team.get_member_count()  # N+1 HERE
    teams_html += f"<div>...{member_count}...</div>"

After:

if current_user.is_admin:
    teams, _ = await team_service.list_teams()
else:
    teams = await team_service.get_user_teams(current_user.email)

# Batch fetch member counts with caching
team_ids = [str(team.id) for team in teams]
member_counts = await team_service.get_member_counts_batch_cached(team_ids)

teams_html = ""
for team in teams:
    member_count = member_counts.get(str(team.id), 0)
    teams_html += f"<div>...{member_count}...</div>"

2.4 Update routers/teams.py:list_teams() (lines 152-168)

Before:

team_responses = [
    TeamResponse(
        id=team.id,
        name=team.name,
        member_count=team.get_member_count(),  # N+1 HERE
        ...
    )
    for team in teams
]

After:

# Batch fetch member counts with caching
team_ids = [str(team.id) for team in teams]
member_counts = await service.get_member_counts_batch_cached(team_ids)

team_responses = [
    TeamResponse(
        id=team.id,
        name=team.name,
        member_count=member_counts.get(str(team.id), 0),
        ...
    )
    for team in teams
]

2.5 Update routers/teams.py:discover_teams() (lines 670-681)

Before:

team_responses = [
    DiscoverableTeamResponse(
        id=team.id,
        name=team.name,
        member_count=team.get_member_count(),  # N+1 HERE
        ...
    )
    for team in teams
]

After:

# Batch fetch member counts with caching
team_ids = [str(team.id) for team in teams]
member_counts = await service.get_member_counts_batch_cached(team_ids)

team_responses = [
    DiscoverableTeamResponse(
        id=team.id,
        name=team.name,
        member_count=member_counts.get(str(team.id), 0),
        ...
    )
    for team in teams
]

Phase 3: Add Partial Index

File: New migration mcpgateway/alembic/versions/xxx_add_team_member_count_index.py

Note: PostgreSQL CREATE INDEX CONCURRENTLY cannot run inside a transaction.
Use a blocking index build (acceptable for this small table) or run the
concurrent version manually outside Alembic.

"""Add partial index for team member count queries

Revision ID: xxx
Revises: <previous>
Create Date: 2026-01-04
"""
from typing import Sequence, Union

from alembic import op
from sqlalchemy import inspect

revision: str = "xxx"
down_revision: Union[str, Sequence[str], None] = "<previous>"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None


def _index_exists(table_name: str, index_name: str) -> bool:
    """Check if an index already exists."""
    conn = op.get_bind()
    inspector = inspect(conn)
    try:
        existing = inspector.get_indexes(table_name)
        return any(idx["name"] == index_name for idx in existing)
    except Exception:
        return False


def upgrade() -> None:
    """Add partial index for team member count queries."""
    conn = op.get_bind()
    dialect = conn.dialect.name

    if dialect == "postgresql":
        # PostgreSQL: Use partial index (most efficient for WHERE is_active = true)
        # Note: NOT using CONCURRENTLY since that requires running outside transaction
        # For production with large tables, consider running CONCURRENTLY manually:
        #   CREATE INDEX CONCURRENTLY idx_email_team_members_team_active_partial
        #   ON email_team_members(team_id) WHERE is_active = true;
        if not _index_exists("email_team_members", "idx_email_team_members_team_active_partial"):
            op.execute("""
                CREATE INDEX idx_email_team_members_team_active_partial
                ON email_team_members(team_id)
                WHERE is_active = true
            """)
            print("✓ Created partial index idx_email_team_members_team_active_partial")
    else:
        # SQLite/MySQL: Regular composite index (no partial index support)
        if not _index_exists("email_team_members", "idx_email_team_members_team_active_count"):
            op.create_index(
                "idx_email_team_members_team_active_count",
                "email_team_members",
                ["team_id", "is_active"]
            )
            print("✓ Created composite index idx_email_team_members_team_active_count")


def downgrade() -> None:
    """Remove the partial index."""
    conn = op.get_bind()
    dialect = conn.dialect.name

    if dialect == "postgresql":
        op.execute("DROP INDEX IF EXISTS idx_email_team_members_team_active_partial")
    else:
        try:
            op.drop_index("idx_email_team_members_team_active_count", "email_team_members")
        except Exception:
            pass  # Index may not exist

Phase 4: Add Redis Caching

Add Redis caching layer for member counts with configurable TTL. This requires:

  1. Add configuration settings - Enable flag + TTL in config.py
  2. Add Redis client access - Use existing AuthCache pattern or add helper
  3. Implement cache-through pattern - Check cache, fallback to DB
  4. Add cache invalidation - Clear on membership add/remove/update

4.1 Add Configuration Settings

File: mcpgateway/config.py

Add to the caching settings section (near line 1021-1046):

# Team member count cache settings
team_member_count_cache_enabled: bool = Field(
    default=True,
    description="Enable Redis caching for team member counts"
)
team_member_count_cache_ttl: int = Field(
    default=300,
    ge=30,
    le=3600,
    description="TTL in seconds for team member count cache (default: 5 minutes)"
)

File: .env.example

Add documentation:

# Team member count cache (reduces N+1 queries in admin UI)
# TEAM_MEMBER_COUNT_CACHE_ENABLED=true
# TEAM_MEMBER_COUNT_CACHE_TTL=300

File: docker-compose.yml

Add to gateway service environment (near line 147, after ADMIN_STATS_CACHE_PERFORMANCE_TTL):

      # Team member count cache (reduces N+1 queries)
      - TEAM_MEMBER_COUNT_CACHE_ENABLED=true
      - TEAM_MEMBER_COUNT_CACHE_TTL=300

File: charts/mcp-stack/values.yaml

Add to config section (near line 230, after ADMIN_STATS_CACHE_PERFORMANCE_TTL):

    # Team member count cache (reduces N+1 queries in admin UI)
    TEAM_MEMBER_COUNT_CACHE_ENABLED: "true"  # Enable team member count caching
    TEAM_MEMBER_COUNT_CACHE_TTL: "300"       # Cache TTL in seconds (30-3600)

File: charts/mcp-stack/values.schema.json

Add to properties (if schema validation is used):

"TEAM_MEMBER_COUNT_CACHE_ENABLED": {
  "type": "string",
  "description": "Enable Redis caching for team member counts"
},
"TEAM_MEMBER_COUNT_CACHE_TTL": {
  "type": "string",
  "description": "Team member count cache TTL in seconds (30-3600)"
}

File: docs/docs/manage/configuration.md

Add new section after "Admin Stats Cache" (around line 510):

### Team Member Count Cache

```bash
# Team member count cache (reduces N+1 queries in admin UI)
TEAM_MEMBER_COUNT_CACHE_ENABLED=true  # Enable team member count caching
TEAM_MEMBER_COUNT_CACHE_TTL=300       # Cache TTL in seconds (30-3600)

When TEAM_MEMBER_COUNT_CACHE_ENABLED=true (default), team member counts are cached in Redis:

  • Member counts: Cached for TEAM_MEMBER_COUNT_CACHE_TTL seconds (default: 300)

Cache is automatically invalidated when team members are added, removed, or their is_active status changes.

Performance Note: This cache eliminates N+1 query patterns in the admin UI team listings, reducing /admin/ P95 latency from ~14s to <500ms under load.


#### 4.2 Add Cached Member Count Method

**File:** `mcpgateway/services/team_management_service.py`

```python
from mcpgateway.utils.redis_client import get_redis_client
from mcpgateway.config import get_settings


def _get_member_count_cache_key(self, team_id: str) -> str:
    """Build cache key using settings.cache_prefix for consistency."""
    settings = get_settings()
    prefix = getattr(settings, "cache_prefix", "mcpgw:")
    return f"{prefix}team:member_count:{team_id}"


async def get_member_counts_batch_cached(self, team_ids: List[str]) -> Dict[str, int]:
    """Get member counts for multiple teams, using Redis cache with DB fallback.

    Caching behavior is controlled by settings:
    - team_member_count_cache_enabled: Enable/disable caching (default: True)
    - team_member_count_cache_ttl: Cache TTL in seconds (default: 300)

    Args:
        team_ids: List of team UUIDs

    Returns:
        Dict mapping team_id to member count
    """
    if not team_ids:
        return {}

    settings = get_settings()
    cache_enabled = settings.team_member_count_cache_enabled
    cache_ttl = settings.team_member_count_cache_ttl

    # If caching disabled, go straight to batch DB query
    if not cache_enabled:
        return self.get_member_counts_batch(team_ids)

    redis_client = await get_redis_client()
    result: Dict[str, int] = {}
    cache_misses: List[str] = []

    # Step 1: Check Redis cache for all team IDs
    if redis_client:
        try:
            cache_keys = [self._get_member_count_cache_key(tid) for tid in team_ids]
            cached_values = await redis_client.mget(cache_keys)

            for tid, cached in zip(team_ids, cached_values):
                if cached is not None:
                    result[tid] = int(cached)
                else:
                    cache_misses.append(tid)
        except Exception as e:
            logger.warning(f"Redis cache read failed, falling back to DB: {e}")
            cache_misses = list(team_ids)
    else:
        # No Redis available, fall back to DB
        cache_misses = list(team_ids)

    # Step 2: Query database for cache misses
    if cache_misses:
        try:
            db_results = self.db.query(
                EmailTeamMember.team_id,
                func.count(EmailTeamMember.id).label("count")
            ).filter(
                EmailTeamMember.team_id.in_(cache_misses),
                EmailTeamMember.is_active.is_(True)
            ).group_by(EmailTeamMember.team_id).all()

            self.db.commit()

            db_counts = {str(row.team_id): row.count for row in db_results}

            # Fill in results and cache them
            for tid in cache_misses:
                count = db_counts.get(tid, 0)
                result[tid] = count

                # Step 3: Cache the result with configured TTL
                if redis_client:
                    try:
                        await redis_client.setex(
                            self._get_member_count_cache_key(tid),
                            cache_ttl,
                            str(count)
                        )
                    except Exception as e:
                        logger.warning(f"Redis cache write failed for team {tid}: {e}")

        except Exception as e:
            self.db.rollback()
            logger.error(f"Failed to get member counts for teams: {e}")
            raise

    return result


async def invalidate_team_member_count_cache(self, team_id: str) -> None:
    """Invalidate the cached member count for a team.

    Call this after any membership changes (add/remove/update).
    No-op if caching is disabled or Redis unavailable.

    Args:
        team_id: Team UUID to invalidate
    """
    settings = get_settings()
    if not settings.team_member_count_cache_enabled:
        return

    redis_client = await get_redis_client()
    if redis_client:
        try:
            await redis_client.delete(self._get_member_count_cache_key(team_id))
        except Exception as e:
            logger.warning(f"Failed to invalidate member count cache for team {team_id}: {e}")

4.3 Add Cache Invalidation to Membership Methods

File: mcpgateway/services/team_management_service.py

Add await invalidation calls to all membership-modifying methods:

# In create_team() (line ~204, ~216) - after adding creator as owner:
await self.invalidate_team_member_count_cache(str(team.id))

# In add_member_to_team() (line ~486) - after successful add:
await self.invalidate_team_member_count_cache(str(team_id))

# In remove_member_from_team() (line ~516) - after successful remove:
await self.invalidate_team_member_count_cache(str(team_id))

# In approve_join_request() (line ~1057) - after creating new member:
await self.invalidate_team_member_count_cache(str(join_request.team_id))

# In update_member_role() (line ~588):
# Note: update_member_role only changes role, not is_active, so no invalidation needed

4.4 Update Call Sites to Use Cached Method

Replace get_member_counts_batch() calls with get_member_counts_batch_cached() in:

  • admin.py:admin_ui()
  • admin.py:_render_unified_teams_view()
  • admin.py:admin_list_teams()
  • routers/teams.py:list_teams()
  • routers/teams.py:discover_teams()

Files to Modify

Configuration Files

File Changes
mcpgateway/config.py Add team_member_count_cache_enabled and team_member_count_cache_ttl settings
.env.example Document new cache settings
docker-compose.yml Add cache env vars to gateway service
charts/mcp-stack/values.yaml Add cache settings to config section
charts/mcp-stack/values.schema.json Add schema for new settings
charts/mcp-stack/README.md Document new settings in helm chart docs
docs/docs/manage/configuration.md Add Team Member Count Cache section

Implementation Files

File Changes
mcpgateway/services/team_management_service.py Add 3 batch methods + cached version + invalidation helper
mcpgateway/admin.py Update 4 call sites (lines 2454, 3318, 3326, 3510) to use cached batch methods
mcpgateway/routers/teams.py Update 2 call sites (lines 162, 678) to use cached batch methods
mcpgateway/alembic/versions/ Add partial index migration

Test Files

File Changes
tests/unit/mcpgateway/services/test_team_management_service.py Add tests for batch methods + caching + invalidation
tests/integration/ Add cache invalidation integration tests

Expected Impact

Metric Before After (Batch) After (Batch + Cache) Improvement
Queries per _render_unified_teams_view 3N+2 5 4-5 (cache hits) 3N → 4-5 queries
Queries per admin_ui team section 2N+1 3 2-3 (cache hits) 2N → 2-3 queries
/admin/ P95 latency 14,000 ms <2,000 ms <500 ms 28× faster
Idle-in-transaction 60+ <10 <5 12× reduction
Database connections 100+ ~40 ~20 80% reduction
Redis cache hit rate N/A N/A ~80% (after warmup) Reduced DB load

Query reduction breakdown for _render_unified_teams_view:

  • Before: get_user_teams (1) + discover_public_teams (1) + get_member_count (N) + get_user_role_in_team (N) + get_user_join_requests (N) = 3N+2 queries
  • After (batch): get_user_teams (1) + discover_public_teams (1) + get_member_counts_batch (1) + get_user_roles_batch (1) + get_pending_join_requests_batch (1) = 5 queries
  • After (batch + cache): get_user_teams (1) + discover_public_teams (1) + get_user_roles_batch (1) + get_pending_join_requests_batch (1) + member count (0-1 from cache) = 4-5 queries

Testing Checklist

Phase 1: Batch Methods

  • Unit tests for get_member_counts_batch() with empty list, single team, multiple teams
  • Unit tests for get_user_roles_batch() with empty list, mixed membership
  • Unit tests for get_pending_join_requests_batch() with no requests, pending requests
  • Unit tests verify correct counts/roles/requests returned

Phase 2: Call Site Updates

  • Integration test for /admin/ endpoint performance
  • Integration test for /teams API endpoint
  • Verify all call sites use batch methods

Phase 3: Index Migration

  • Migration runs successfully on PostgreSQL (partial index)
  • Migration runs successfully on SQLite (composite index)
  • Downgrade migration works correctly
  • Query plan shows index usage (EXPLAIN ANALYZE)

Phase 4: Redis Caching

  • Unit test config settings load correctly (enabled flag, TTL)
  • Unit test get_member_counts_batch_cached() with caching enabled
  • Unit test get_member_counts_batch_cached() with caching disabled (bypasses Redis)
  • Unit test get_member_counts_batch_cached() with Redis unavailable (graceful fallback)
  • Unit test cache hit returns correct data without DB query
  • Unit test cache miss queries DB and populates cache with configured TTL
  • Unit test invalidate_team_member_count_cache() removes cache entry
  • Unit test invalidate_team_member_count_cache() is no-op when caching disabled
  • Integration test: add_team_member invalidates cache
  • Integration test: remove_team_member invalidates cache
  • Integration test: update_team_member (is_active change) invalidates cache
  • Integration test: verify TTL is respected (cache expires after configured time)

Configuration & Documentation

  • Settings load correctly from environment variables
  • .env.example includes new settings with comments
  • docker-compose.yml includes new settings
  • charts/mcp-stack/values.yaml includes new settings
  • charts/mcp-stack/values.schema.json validates new settings
  • charts/mcp-stack/README.md documents new settings
  • docs/docs/manage/configuration.md documents new cache section
  • Helm chart deploys successfully with new settings

End-to-End Validation

  • Load test confirms N+1 elimination (check pg_stat_statements)
  • Load test confirms cache hit rate >80% after warmup
  • Monitor idle-in-transaction count during load test (<5)
  • Verify /admin/ P95 latency <500ms under load

Related Issues


Implementation Notes

  1. Backward Compatibility: Keep get_member_count() method on EmailTeam for single-team use cases. Only update loop contexts to use batch method.

  2. Transaction Management: The batch query uses self.db.commit() after the read-only query to release the transaction, matching the existing pattern throughout TeamManagementService (see lines 247, 691, 706, 751, 816, 872, 913, 943). Using commit() instead of rollback() avoids expiring loaded ORM objects that may be reused by callers.

  3. Empty Teams: Teams with no active members won't appear in the GROUP BY result. The solution defaults to 0 for missing team IDs.

  4. All N+1 patterns addressed: This PR fixes all three N+1 patterns in _render_unified_teams_view: member counts, user roles, and pending join requests.

  5. Index Strategy: A blocking index build is acceptable for email_team_members since:

    • Table is typically small (hundreds to thousands of rows)
    • Migration runs during deployment downtime
    • For zero-downtime production deploys with large tables, run CREATE INDEX CONCURRENTLY manually before migration
  6. Redis Client:

    • Uses async get_redis_client() from mcpgateway.utils.redis_client
    • get_member_counts_batch_cached() and invalidate_team_member_count_cache() are async def
    • Call sites must use await (all call sites are already in async functions)
    • Note: get_redis_client_sync() only returns the cached client reference, not a sync wrapper
  7. Redis Caching Strategy:

    • Uses MGET for batch cache lookups (single round-trip for N keys)
    • Gracefully degrades to DB-only when Redis unavailable or caching disabled
    • Cache invalidation is explicit (not TTL-only) to ensure consistency after membership changes
    • Configurable TTL (default 300s) provides fallback invalidation for edge cases (e.g., direct DB updates)
  8. Cache Configuration:

    • TEAM_MEMBER_COUNT_CACHE_ENABLED (default: true) - Master switch to enable/disable caching
    • TEAM_MEMBER_COUNT_CACHE_TTL (default: 300, range: 30-3600) - Cache TTL in seconds
    • When disabled, get_member_counts_batch_cached() delegates directly to get_member_counts_batch()
    • Invalidation methods are no-ops when caching is disabled (avoid unnecessary Redis calls)
  9. Cache Key Design:

    • Uses settings.cache_prefix (default: mcpgw:) for consistency with existing cache modules
    • Full key format: {cache_prefix}team:member_count:{team_id} (e.g., mcpgw:team:member_count:uuid)
    • Follows same pattern as AuthCache, RegistryCache, and AdminStatsCache

Open Questions (Resolved)

Question Decision
Implement all batch methods in same PR? Yes - addresses all N+1 patterns in _render_unified_teams_view
Blocking vs concurrent index build? Blocking - table is small, add note for large deployments
Add Redis caching? Yes - maximum performance with configurable settings
Cache TTL duration? Configurable (default 300s, range 30-3600s) via TEAM_MEMBER_COUNT_CACHE_TTL
Cache enable/disable? Configurable via TEAM_MEMBER_COUNT_CACHE_ENABLED (default: true)
Cache invalidation strategy? Explicit invalidation on membership changes + TTL expiry as fallback
Sync vs async Redis client? Async (await get_redis_client()) - required since Redis client is async
Cache key prefix? Use settings.cache_prefix - consistent with AuthCache, RegistryCache, etc.
Invalidation coverage? All membership paths - create_team, add_member_to_team, remove_member_from_team, approve_join_request

Metadata

Metadata

Assignees

Labels

performancePerformance related itemspythonPython / backend development (FastAPI)

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions