-
Notifications
You must be signed in to change notification settings - Fork 614
[PERFORMANCE]: N+1 query pattern in EmailTeam.get_member_count() #1892
Description
Problem Summary
The get_member_count() method on EmailTeam executes a separate COUNT query for each team. When listing teams, this creates an N+1 query pattern causing:
- 11,429+ individual COUNT queries during load tests
- 60+ idle-in-transaction connections under load
- 7.8 second average response time for
/admin/endpoint
Current Implementation
Location: mcpgateway/db.py:1453-1475
def get_member_count(self) -> int:
"""Get the current number of team members."""
from sqlalchemy.orm import object_session
session = object_session(self)
if session is None:
return len([m for m in self.members if m.is_active])
count = session.query(func.count(EmailTeamMember.id)).filter(
EmailTeamMember.team_id == self.id,
EmailTeamMember.is_active.is_(True)
).scalar()
return count or 0Generated SQL (per team):
SELECT count(email_team_members.id) AS count_1
FROM email_team_members
WHERE email_team_members.team_id = :team_id
AND email_team_members.is_active = trueCall Sites Analysis
N+1 Problem Locations (requires batching)
| File | Line | Function | Loop Context |
|---|---|---|---|
mcpgateway/admin.py |
2454 | admin_ui() |
for team in raw_teams |
mcpgateway/admin.py |
3318 | _render_unified_teams_view() |
for team in user_teams |
mcpgateway/admin.py |
3326 | _render_unified_teams_view() |
for team in public_teams |
mcpgateway/admin.py |
3510 | admin_list_teams() |
for team in teams |
mcpgateway/routers/teams.py |
162 | list_teams() |
for team in teams |
mcpgateway/routers/teams.py |
678 | discover_teams() |
loop over joinable teams |
Single-Call Locations (OK as-is)
| File | Line | Function | Context |
|---|---|---|---|
mcpgateway/routers/teams.py |
99 | create_team() |
Single team after creation |
mcpgateway/routers/teams.py |
214 | get_team() |
Single team by ID |
mcpgateway/routers/teams.py |
266 | update_team() |
Single team after update |
Existing Indexes
From migration n8h9i0j1k2l3_add_database_indexes.py:
| Index Name | Columns | Notes |
|---|---|---|
idx_email_team_members_team_id |
team_id |
Basic FK index |
idx_email_team_members_user_email |
user_email |
Basic FK index |
idx_email_team_members_user_team_active |
user_email, team_id, is_active |
Composite |
idx_email_team_members_team_role_active |
team_id, role, is_active |
Composite (includes role for role-based queries) |
Gap: No partial index optimized for the count query pattern WHERE is_active = true.
Proposed Solution
Phase 1: Add Batch Query Helpers
Add three new methods to TeamManagementService that batch-fetch data for multiple teams in single queries.
File: mcpgateway/services/team_management_service.py
Required import additions:
# Add to existing imports at top of file
from typing import Any, Dict, List, Optional, Tuple # Add Dict, Any
from sqlalchemy import func, select # Add func1.1 Batch Member Counts
def get_member_counts_batch(self, team_ids: List[str]) -> Dict[str, int]:
"""Get member counts for multiple teams in a single query.
This is a synchronous method following the existing service pattern.
Note: Like other sync SQLAlchemy calls, this will block the event
loop in async contexts. For typical team counts this is acceptable.
Args:
team_ids: List of team UUIDs
Returns:
Dict mapping team_id to member count
Raises:
Exception: Re-raises any database errors after rollback
Examples:
>>> from unittest.mock import Mock
>>> service = TeamManagementService(Mock())
>>> service.get_member_counts_batch([])
{}
"""
if not team_ids:
return {}
try:
# Single query for all teams
results = self.db.query(
EmailTeamMember.team_id,
func.count(EmailTeamMember.id).label("count")
).filter(
EmailTeamMember.team_id.in_(team_ids),
EmailTeamMember.is_active.is_(True)
).group_by(EmailTeamMember.team_id).all()
self.db.commit() # Release transaction to avoid idle-in-transaction
# Build result dict, defaulting to 0 for teams with no members
counts = {str(row.team_id): row.count for row in results}
return {tid: counts.get(tid, 0) for tid in team_ids}
except Exception as e:
self.db.rollback()
logger.error(f"Failed to get member counts for teams: {e}")
raiseGenerated SQL (single query for N teams):
SELECT email_team_members.team_id, count(email_team_members.id) AS count
FROM email_team_members
WHERE email_team_members.team_id IN (:id_1, :id_2, ..., :id_n)
AND email_team_members.is_active = true
GROUP BY email_team_members.team_id1.2 Batch User Roles
def get_user_roles_batch(self, user_email: str, team_ids: List[str]) -> Dict[str, Optional[str]]:
"""Get a user's role in multiple teams in a single query.
Args:
user_email: Email of the user
team_ids: List of team UUIDs
Returns:
Dict mapping team_id to role (or None if not a member)
Raises:
Exception: Re-raises any database errors after rollback
"""
if not team_ids:
return {}
try:
# Single query for all teams
results = self.db.query(
EmailTeamMember.team_id,
EmailTeamMember.role
).filter(
EmailTeamMember.user_email == user_email,
EmailTeamMember.team_id.in_(team_ids),
EmailTeamMember.is_active.is_(True)
).all()
self.db.commit() # Release transaction to avoid idle-in-transaction
# Build result dict - teams with no membership return None
roles = {str(row.team_id): row.role for row in results}
return {tid: roles.get(tid) for tid in team_ids}
except Exception as e:
self.db.rollback()
logger.error(f"Failed to get user roles for {user_email}: {e}")
raiseGenerated SQL:
SELECT email_team_members.team_id, email_team_members.role
FROM email_team_members
WHERE email_team_members.user_email = :user_email
AND email_team_members.team_id IN (:id_1, :id_2, ..., :id_n)
AND email_team_members.is_active = true1.3 Batch Pending Join Requests
def get_pending_join_requests_batch(self, user_email: str, team_ids: List[str]) -> Dict[str, Optional[Any]]:
"""Get pending join requests for a user across multiple teams in a single query.
Args:
user_email: Email of the user
team_ids: List of team UUIDs to check
Returns:
Dict mapping team_id to pending EmailTeamJoinRequest (or None if no pending request)
Raises:
Exception: Re-raises any database errors after rollback
"""
if not team_ids:
return {}
try:
# Single query for all pending requests across teams
results = self.db.query(EmailTeamJoinRequest).filter(
EmailTeamJoinRequest.user_email == user_email,
EmailTeamJoinRequest.team_id.in_(team_ids),
EmailTeamJoinRequest.status == "pending"
).all()
self.db.commit() # Release transaction to avoid idle-in-transaction
# Build result dict - only one pending request per team expected
requests = {str(req.team_id): req for req in results}
return {tid: requests.get(tid) for tid in team_ids}
except Exception as e:
self.db.rollback()
logger.error(f"Failed to get pending join requests for {user_email}: {e}")
raiseGenerated SQL:
SELECT * FROM email_team_join_requests
WHERE user_email = :user_email
AND team_id IN (:id_1, :id_2, ..., :id_n)
AND status = 'pending'Phase 2: Update Call Sites
2.1 Update admin.py:admin_ui() (lines 2443-2460)
Before:
raw_teams = await team_service.get_user_teams(user_email)
user_teams = []
for team in raw_teams:
user_role = await team_service.get_user_role_in_team(user_email, team.id)
team_dict = {
"id": str(team.id),
"name": str(team.name),
"member_count": team.get_member_count(), # N+1 HERE
"role": user_role or "member",
}
user_teams.append(team_dict)After:
raw_teams = await team_service.get_user_teams(user_email)
# Batch fetch all data in 2 queries instead of 2N queries
team_ids = [str(team.id) for team in raw_teams]
member_counts = await team_service.get_member_counts_batch_cached(team_ids)
user_roles = team_service.get_user_roles_batch(user_email, team_ids)
user_teams = []
for team in raw_teams:
try:
team_id = str(team.id)
team_dict = {
"id": team_id,
"name": str(team.name) if team.name else "",
"type": str(getattr(team, "type", "organization")),
"is_personal": bool(getattr(team, "is_personal", False)),
"member_count": member_counts.get(team_id, 0),
"role": user_roles.get(team_id) or "member",
}
user_teams.append(team_dict)
except Exception as team_error:
LOGGER.warning(f"Failed to serialize team {getattr(team, 'id', 'unknown')}: {team_error}")
continue2.2 Update admin.py:_render_unified_teams_view() (lines 3305-3327)
Before:
user_teams = await team_service.get_user_teams(current_user.email)
public_teams = await team_service.discover_public_teams(current_user.email)
all_teams = []
for team in user_teams:
user_role = await team_service.get_user_role_in_team(current_user.email, team.id)
all_teams.append({
"team": team,
"relationship": "owner" if user_role == "owner" else "member",
"member_count": team.get_member_count() # N+1 HERE
})
for team in public_teams:
all_teams.append({
"team": team,
"member_count": team.get_member_count() # N+1 HERE
})After:
user_teams = await team_service.get_user_teams(current_user.email)
public_teams = await team_service.discover_public_teams(current_user.email)
# Batch fetch ALL data upfront - 3 queries instead of 3N queries
user_team_ids = [str(t.id) for t in user_teams]
public_team_ids = [str(t.id) for t in public_teams]
all_team_ids = user_team_ids + public_team_ids
member_counts = await team_service.get_member_counts_batch_cached(all_team_ids)
user_roles = team_service.get_user_roles_batch(current_user.email, user_team_ids)
pending_requests = team_service.get_pending_join_requests_batch(current_user.email, public_team_ids)
all_teams = []
# Add user's teams (owned and member)
for team in user_teams:
team_id = str(team.id)
user_role = user_roles.get(team_id)
relationship = "owner" if user_role == "owner" else "member"
all_teams.append({
"team": team,
"relationship": relationship,
"member_count": member_counts.get(team_id, 0)
})
# Add public teams user can join
for team in public_teams:
team_id = str(team.id)
all_teams.append({
"team": team,
"relationship": "join",
"member_count": member_counts.get(team_id, 0),
"pending_request": pending_requests.get(team_id) # None if no pending request
})2.3 Update admin.py:admin_list_teams() (lines 3503-3542)
Before:
if current_user.is_admin:
teams, _ = await team_service.list_teams()
else:
teams = await team_service.get_user_teams(current_user.email)
teams_html = ""
for team in teams:
member_count = team.get_member_count() # N+1 HERE
teams_html += f"<div>...{member_count}...</div>"After:
if current_user.is_admin:
teams, _ = await team_service.list_teams()
else:
teams = await team_service.get_user_teams(current_user.email)
# Batch fetch member counts with caching
team_ids = [str(team.id) for team in teams]
member_counts = await team_service.get_member_counts_batch_cached(team_ids)
teams_html = ""
for team in teams:
member_count = member_counts.get(str(team.id), 0)
teams_html += f"<div>...{member_count}...</div>"2.4 Update routers/teams.py:list_teams() (lines 152-168)
Before:
team_responses = [
TeamResponse(
id=team.id,
name=team.name,
member_count=team.get_member_count(), # N+1 HERE
...
)
for team in teams
]After:
# Batch fetch member counts with caching
team_ids = [str(team.id) for team in teams]
member_counts = await service.get_member_counts_batch_cached(team_ids)
team_responses = [
TeamResponse(
id=team.id,
name=team.name,
member_count=member_counts.get(str(team.id), 0),
...
)
for team in teams
]2.5 Update routers/teams.py:discover_teams() (lines 670-681)
Before:
team_responses = [
DiscoverableTeamResponse(
id=team.id,
name=team.name,
member_count=team.get_member_count(), # N+1 HERE
...
)
for team in teams
]After:
# Batch fetch member counts with caching
team_ids = [str(team.id) for team in teams]
member_counts = await service.get_member_counts_batch_cached(team_ids)
team_responses = [
DiscoverableTeamResponse(
id=team.id,
name=team.name,
member_count=member_counts.get(str(team.id), 0),
...
)
for team in teams
]Phase 3: Add Partial Index
File: New migration mcpgateway/alembic/versions/xxx_add_team_member_count_index.py
Note: PostgreSQL
CREATE INDEX CONCURRENTLYcannot run inside a transaction.
Use a blocking index build (acceptable for this small table) or run the
concurrent version manually outside Alembic.
"""Add partial index for team member count queries
Revision ID: xxx
Revises: <previous>
Create Date: 2026-01-04
"""
from typing import Sequence, Union
from alembic import op
from sqlalchemy import inspect
revision: str = "xxx"
down_revision: Union[str, Sequence[str], None] = "<previous>"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def _index_exists(table_name: str, index_name: str) -> bool:
"""Check if an index already exists."""
conn = op.get_bind()
inspector = inspect(conn)
try:
existing = inspector.get_indexes(table_name)
return any(idx["name"] == index_name for idx in existing)
except Exception:
return False
def upgrade() -> None:
"""Add partial index for team member count queries."""
conn = op.get_bind()
dialect = conn.dialect.name
if dialect == "postgresql":
# PostgreSQL: Use partial index (most efficient for WHERE is_active = true)
# Note: NOT using CONCURRENTLY since that requires running outside transaction
# For production with large tables, consider running CONCURRENTLY manually:
# CREATE INDEX CONCURRENTLY idx_email_team_members_team_active_partial
# ON email_team_members(team_id) WHERE is_active = true;
if not _index_exists("email_team_members", "idx_email_team_members_team_active_partial"):
op.execute("""
CREATE INDEX idx_email_team_members_team_active_partial
ON email_team_members(team_id)
WHERE is_active = true
""")
print("✓ Created partial index idx_email_team_members_team_active_partial")
else:
# SQLite/MySQL: Regular composite index (no partial index support)
if not _index_exists("email_team_members", "idx_email_team_members_team_active_count"):
op.create_index(
"idx_email_team_members_team_active_count",
"email_team_members",
["team_id", "is_active"]
)
print("✓ Created composite index idx_email_team_members_team_active_count")
def downgrade() -> None:
"""Remove the partial index."""
conn = op.get_bind()
dialect = conn.dialect.name
if dialect == "postgresql":
op.execute("DROP INDEX IF EXISTS idx_email_team_members_team_active_partial")
else:
try:
op.drop_index("idx_email_team_members_team_active_count", "email_team_members")
except Exception:
pass # Index may not existPhase 4: Add Redis Caching
Add Redis caching layer for member counts with configurable TTL. This requires:
- Add configuration settings - Enable flag + TTL in config.py
- Add Redis client access - Use existing
AuthCachepattern or add helper - Implement cache-through pattern - Check cache, fallback to DB
- Add cache invalidation - Clear on membership add/remove/update
4.1 Add Configuration Settings
File: mcpgateway/config.py
Add to the caching settings section (near line 1021-1046):
# Team member count cache settings
team_member_count_cache_enabled: bool = Field(
default=True,
description="Enable Redis caching for team member counts"
)
team_member_count_cache_ttl: int = Field(
default=300,
ge=30,
le=3600,
description="TTL in seconds for team member count cache (default: 5 minutes)"
)File: .env.example
Add documentation:
# Team member count cache (reduces N+1 queries in admin UI)
# TEAM_MEMBER_COUNT_CACHE_ENABLED=true
# TEAM_MEMBER_COUNT_CACHE_TTL=300File: docker-compose.yml
Add to gateway service environment (near line 147, after ADMIN_STATS_CACHE_PERFORMANCE_TTL):
# Team member count cache (reduces N+1 queries)
- TEAM_MEMBER_COUNT_CACHE_ENABLED=true
- TEAM_MEMBER_COUNT_CACHE_TTL=300File: charts/mcp-stack/values.yaml
Add to config section (near line 230, after ADMIN_STATS_CACHE_PERFORMANCE_TTL):
# Team member count cache (reduces N+1 queries in admin UI)
TEAM_MEMBER_COUNT_CACHE_ENABLED: "true" # Enable team member count caching
TEAM_MEMBER_COUNT_CACHE_TTL: "300" # Cache TTL in seconds (30-3600)File: charts/mcp-stack/values.schema.json
Add to properties (if schema validation is used):
"TEAM_MEMBER_COUNT_CACHE_ENABLED": {
"type": "string",
"description": "Enable Redis caching for team member counts"
},
"TEAM_MEMBER_COUNT_CACHE_TTL": {
"type": "string",
"description": "Team member count cache TTL in seconds (30-3600)"
}File: docs/docs/manage/configuration.md
Add new section after "Admin Stats Cache" (around line 510):
### Team Member Count Cache
```bash
# Team member count cache (reduces N+1 queries in admin UI)
TEAM_MEMBER_COUNT_CACHE_ENABLED=true # Enable team member count caching
TEAM_MEMBER_COUNT_CACHE_TTL=300 # Cache TTL in seconds (30-3600)When TEAM_MEMBER_COUNT_CACHE_ENABLED=true (default), team member counts are cached in Redis:
- Member counts: Cached for
TEAM_MEMBER_COUNT_CACHE_TTLseconds (default: 300)
Cache is automatically invalidated when team members are added, removed, or their is_active status changes.
Performance Note: This cache eliminates N+1 query patterns in the admin UI team listings, reducing /admin/ P95 latency from ~14s to <500ms under load.
#### 4.2 Add Cached Member Count Method
**File:** `mcpgateway/services/team_management_service.py`
```python
from mcpgateway.utils.redis_client import get_redis_client
from mcpgateway.config import get_settings
def _get_member_count_cache_key(self, team_id: str) -> str:
"""Build cache key using settings.cache_prefix for consistency."""
settings = get_settings()
prefix = getattr(settings, "cache_prefix", "mcpgw:")
return f"{prefix}team:member_count:{team_id}"
async def get_member_counts_batch_cached(self, team_ids: List[str]) -> Dict[str, int]:
"""Get member counts for multiple teams, using Redis cache with DB fallback.
Caching behavior is controlled by settings:
- team_member_count_cache_enabled: Enable/disable caching (default: True)
- team_member_count_cache_ttl: Cache TTL in seconds (default: 300)
Args:
team_ids: List of team UUIDs
Returns:
Dict mapping team_id to member count
"""
if not team_ids:
return {}
settings = get_settings()
cache_enabled = settings.team_member_count_cache_enabled
cache_ttl = settings.team_member_count_cache_ttl
# If caching disabled, go straight to batch DB query
if not cache_enabled:
return self.get_member_counts_batch(team_ids)
redis_client = await get_redis_client()
result: Dict[str, int] = {}
cache_misses: List[str] = []
# Step 1: Check Redis cache for all team IDs
if redis_client:
try:
cache_keys = [self._get_member_count_cache_key(tid) for tid in team_ids]
cached_values = await redis_client.mget(cache_keys)
for tid, cached in zip(team_ids, cached_values):
if cached is not None:
result[tid] = int(cached)
else:
cache_misses.append(tid)
except Exception as e:
logger.warning(f"Redis cache read failed, falling back to DB: {e}")
cache_misses = list(team_ids)
else:
# No Redis available, fall back to DB
cache_misses = list(team_ids)
# Step 2: Query database for cache misses
if cache_misses:
try:
db_results = self.db.query(
EmailTeamMember.team_id,
func.count(EmailTeamMember.id).label("count")
).filter(
EmailTeamMember.team_id.in_(cache_misses),
EmailTeamMember.is_active.is_(True)
).group_by(EmailTeamMember.team_id).all()
self.db.commit()
db_counts = {str(row.team_id): row.count for row in db_results}
# Fill in results and cache them
for tid in cache_misses:
count = db_counts.get(tid, 0)
result[tid] = count
# Step 3: Cache the result with configured TTL
if redis_client:
try:
await redis_client.setex(
self._get_member_count_cache_key(tid),
cache_ttl,
str(count)
)
except Exception as e:
logger.warning(f"Redis cache write failed for team {tid}: {e}")
except Exception as e:
self.db.rollback()
logger.error(f"Failed to get member counts for teams: {e}")
raise
return result
async def invalidate_team_member_count_cache(self, team_id: str) -> None:
"""Invalidate the cached member count for a team.
Call this after any membership changes (add/remove/update).
No-op if caching is disabled or Redis unavailable.
Args:
team_id: Team UUID to invalidate
"""
settings = get_settings()
if not settings.team_member_count_cache_enabled:
return
redis_client = await get_redis_client()
if redis_client:
try:
await redis_client.delete(self._get_member_count_cache_key(team_id))
except Exception as e:
logger.warning(f"Failed to invalidate member count cache for team {team_id}: {e}")
4.3 Add Cache Invalidation to Membership Methods
File: mcpgateway/services/team_management_service.py
Add await invalidation calls to all membership-modifying methods:
# In create_team() (line ~204, ~216) - after adding creator as owner:
await self.invalidate_team_member_count_cache(str(team.id))
# In add_member_to_team() (line ~486) - after successful add:
await self.invalidate_team_member_count_cache(str(team_id))
# In remove_member_from_team() (line ~516) - after successful remove:
await self.invalidate_team_member_count_cache(str(team_id))
# In approve_join_request() (line ~1057) - after creating new member:
await self.invalidate_team_member_count_cache(str(join_request.team_id))
# In update_member_role() (line ~588):
# Note: update_member_role only changes role, not is_active, so no invalidation needed4.4 Update Call Sites to Use Cached Method
Replace get_member_counts_batch() calls with get_member_counts_batch_cached() in:
admin.py:admin_ui()admin.py:_render_unified_teams_view()admin.py:admin_list_teams()routers/teams.py:list_teams()routers/teams.py:discover_teams()
Files to Modify
Configuration Files
| File | Changes |
|---|---|
mcpgateway/config.py |
Add team_member_count_cache_enabled and team_member_count_cache_ttl settings |
.env.example |
Document new cache settings |
docker-compose.yml |
Add cache env vars to gateway service |
charts/mcp-stack/values.yaml |
Add cache settings to config section |
charts/mcp-stack/values.schema.json |
Add schema for new settings |
charts/mcp-stack/README.md |
Document new settings in helm chart docs |
docs/docs/manage/configuration.md |
Add Team Member Count Cache section |
Implementation Files
| File | Changes |
|---|---|
mcpgateway/services/team_management_service.py |
Add 3 batch methods + cached version + invalidation helper |
mcpgateway/admin.py |
Update 4 call sites (lines 2454, 3318, 3326, 3510) to use cached batch methods |
mcpgateway/routers/teams.py |
Update 2 call sites (lines 162, 678) to use cached batch methods |
mcpgateway/alembic/versions/ |
Add partial index migration |
Test Files
| File | Changes |
|---|---|
tests/unit/mcpgateway/services/test_team_management_service.py |
Add tests for batch methods + caching + invalidation |
tests/integration/ |
Add cache invalidation integration tests |
Expected Impact
| Metric | Before | After (Batch) | After (Batch + Cache) | Improvement |
|---|---|---|---|---|
Queries per _render_unified_teams_view |
3N+2 | 5 | 4-5 (cache hits) | 3N → 4-5 queries |
Queries per admin_ui team section |
2N+1 | 3 | 2-3 (cache hits) | 2N → 2-3 queries |
/admin/ P95 latency |
14,000 ms | <2,000 ms | <500 ms | 28× faster |
| Idle-in-transaction | 60+ | <10 | <5 | 12× reduction |
| Database connections | 100+ | ~40 | ~20 | 80% reduction |
| Redis cache hit rate | N/A | N/A | ~80% (after warmup) | Reduced DB load |
Query reduction breakdown for _render_unified_teams_view:
- Before:
get_user_teams(1) +discover_public_teams(1) +get_member_count(N) +get_user_role_in_team(N) +get_user_join_requests(N) = 3N+2 queries - After (batch):
get_user_teams(1) +discover_public_teams(1) +get_member_counts_batch(1) +get_user_roles_batch(1) +get_pending_join_requests_batch(1) = 5 queries - After (batch + cache):
get_user_teams(1) +discover_public_teams(1) +get_user_roles_batch(1) +get_pending_join_requests_batch(1) + member count (0-1 from cache) = 4-5 queries
Testing Checklist
Phase 1: Batch Methods
- Unit tests for
get_member_counts_batch()with empty list, single team, multiple teams - Unit tests for
get_user_roles_batch()with empty list, mixed membership - Unit tests for
get_pending_join_requests_batch()with no requests, pending requests - Unit tests verify correct counts/roles/requests returned
Phase 2: Call Site Updates
- Integration test for
/admin/endpoint performance - Integration test for
/teamsAPI endpoint - Verify all call sites use batch methods
Phase 3: Index Migration
- Migration runs successfully on PostgreSQL (partial index)
- Migration runs successfully on SQLite (composite index)
- Downgrade migration works correctly
- Query plan shows index usage (
EXPLAIN ANALYZE)
Phase 4: Redis Caching
- Unit test config settings load correctly (enabled flag, TTL)
- Unit test
get_member_counts_batch_cached()with caching enabled - Unit test
get_member_counts_batch_cached()with caching disabled (bypasses Redis) - Unit test
get_member_counts_batch_cached()with Redis unavailable (graceful fallback) - Unit test cache hit returns correct data without DB query
- Unit test cache miss queries DB and populates cache with configured TTL
- Unit test
invalidate_team_member_count_cache()removes cache entry - Unit test
invalidate_team_member_count_cache()is no-op when caching disabled - Integration test: add_team_member invalidates cache
- Integration test: remove_team_member invalidates cache
- Integration test: update_team_member (is_active change) invalidates cache
- Integration test: verify TTL is respected (cache expires after configured time)
Configuration & Documentation
- Settings load correctly from environment variables
-
.env.exampleincludes new settings with comments -
docker-compose.ymlincludes new settings -
charts/mcp-stack/values.yamlincludes new settings -
charts/mcp-stack/values.schema.jsonvalidates new settings -
charts/mcp-stack/README.mddocuments new settings -
docs/docs/manage/configuration.mddocuments new cache section - Helm chart deploys successfully with new settings
End-to-End Validation
- Load test confirms N+1 elimination (check
pg_stat_statements) - Load test confirms cache hit rate >80% after warmup
- Monitor idle-in-transaction count during load test (<5)
- Verify
/admin/P95 latency <500ms under load
Related Issues
- [PERFORMANCE]: Add partial index for team member count queries #1893 - Add partial index for team member count queries
- [PERFORMANCE]: Admin UI endpoints have high tail latency (5-10s p95) #1894 - Admin UI endpoints have high tail latency
- [PERFORMANCE]: Fix PostgreSQL 'Idle in Transaction' Connection Issue #1862 - Fix PostgreSQL 'Idle in Transaction' Connection Issue
Implementation Notes
-
Backward Compatibility: Keep
get_member_count()method onEmailTeamfor single-team use cases. Only update loop contexts to use batch method. -
Transaction Management: The batch query uses
self.db.commit()after the read-only query to release the transaction, matching the existing pattern throughoutTeamManagementService(see lines 247, 691, 706, 751, 816, 872, 913, 943). Usingcommit()instead ofrollback()avoids expiring loaded ORM objects that may be reused by callers. -
Empty Teams: Teams with no active members won't appear in the GROUP BY result. The solution defaults to 0 for missing team IDs.
-
All N+1 patterns addressed: This PR fixes all three N+1 patterns in
_render_unified_teams_view: member counts, user roles, and pending join requests. -
Index Strategy: A blocking index build is acceptable for
email_team_memberssince:- Table is typically small (hundreds to thousands of rows)
- Migration runs during deployment downtime
- For zero-downtime production deploys with large tables, run
CREATE INDEX CONCURRENTLYmanually before migration
-
Redis Client:
- Uses async
get_redis_client()frommcpgateway.utils.redis_client get_member_counts_batch_cached()andinvalidate_team_member_count_cache()areasync def- Call sites must use
await(all call sites are already in async functions) - Note:
get_redis_client_sync()only returns the cached client reference, not a sync wrapper
- Uses async
-
Redis Caching Strategy:
- Uses
MGETfor batch cache lookups (single round-trip for N keys) - Gracefully degrades to DB-only when Redis unavailable or caching disabled
- Cache invalidation is explicit (not TTL-only) to ensure consistency after membership changes
- Configurable TTL (default 300s) provides fallback invalidation for edge cases (e.g., direct DB updates)
- Uses
-
Cache Configuration:
TEAM_MEMBER_COUNT_CACHE_ENABLED(default:true) - Master switch to enable/disable cachingTEAM_MEMBER_COUNT_CACHE_TTL(default:300, range: 30-3600) - Cache TTL in seconds- When disabled,
get_member_counts_batch_cached()delegates directly toget_member_counts_batch() - Invalidation methods are no-ops when caching is disabled (avoid unnecessary Redis calls)
-
Cache Key Design:
- Uses
settings.cache_prefix(default:mcpgw:) for consistency with existing cache modules - Full key format:
{cache_prefix}team:member_count:{team_id}(e.g.,mcpgw:team:member_count:uuid) - Follows same pattern as
AuthCache,RegistryCache, andAdminStatsCache
- Uses
Open Questions (Resolved)
| Question | Decision |
|---|---|
| Implement all batch methods in same PR? | Yes - addresses all N+1 patterns in _render_unified_teams_view |
| Blocking vs concurrent index build? | Blocking - table is small, add note for large deployments |
| Add Redis caching? | Yes - maximum performance with configurable settings |
| Cache TTL duration? | Configurable (default 300s, range 30-3600s) via TEAM_MEMBER_COUNT_CACHE_TTL |
| Cache enable/disable? | Configurable via TEAM_MEMBER_COUNT_CACHE_ENABLED (default: true) |
| Cache invalidation strategy? | Explicit invalidation on membership changes + TTL expiry as fallback |
| Sync vs async Redis client? | Async (await get_redis_client()) - required since Redis client is async |
| Cache key prefix? | Use settings.cache_prefix - consistent with AuthCache, RegistryCache, etc. |
| Invalidation coverage? | All membership paths - create_team, add_member_to_team, remove_member_from_team, approve_join_request |