Skip to content

[PERFORMANCE]: Admin UI endpoints have high tail latency (5-10s p95) #1894

@crivetimihai

Description

@crivetimihai

Summary

Under load, Admin UI endpoints show significantly higher latency than API endpoints.

Note: During load testing, /admin/* traffic appeared in Locust stats even when AdminUIUser/RealisticUser were expected to be disabled. Verify Locust class picker settings before attributing baseline latency.

Observed Metrics

Endpoint Avg Latency P95 Latency
/admin/ ~5283 ms ~10000 ms
/admin/tools ~1800 ms ~3700 ms

Root Cause

Multiple contributing factors identified via py-spy profiling:

  1. N+1 patterns calling get_member_count() per team (see [PERFORMANCE]: N+1 query pattern in EmailTeam.get_member_count() #1892)
  2. Heavy aggregation queries for dashboard stats
  3. Pydantic model_validate() overhead for large result sets
  4. ORM query building overhead

py-spy sample:

Thread 21 (active+gil): "MainThread"
    __eq__ (sqlalchemy/sql/operators.py:584)
    operate (sqlalchemy/orm/attributes.py:453)
    list_tools (mcpgateway/services/tool_service.py:1527)
    _call_list_with_team_support (mcpgateway/admin.py:2528)
    admin_ui (mcpgateway/admin.py:2649)

Affected Code

  • mcpgateway/admin.py:3315-3327 - loops calling get_member_count()
  • mcpgateway/admin.py:2454, 2528, 2649 - admin UI endpoints
  • mcpgateway/services/tool_service.py:1527 - list_tools

Proposed Fixes

  1. Pagination: Limit items per page, use cursor-based pagination for large lists

  2. Lazy loading: Load detailed stats only when requested (HTMX partial loads)

  3. Caching: Cache dashboard aggregates in Redis with short TTL

  4. Eager loading: Use joinedload() to prevent N+1 patterns

  5. Fix N+1 in get_member_count(): See [PERFORMANCE]: N+1 query pattern in EmailTeam.get_member_count() #1892

  6. Parallelize service calls: Use asyncio.gather() for the 7+ sequential service calls at admin.py:3220-3344 (from [PERFORMANCE]: Admin UI endpoint /admin/ has high latency under load #1907 analysis):

    # Current: 7 sequential calls
    # Proposed: asyncio.gather() for ~5-10% latency reduction
  7. Template size: Admin template is now 695KB / 14,293 lines — consider splitting

Related Issues

Acceptance Criteria

  • /admin/ p95 latency < 2000ms under load
  • /admin/tools p95 latency < 1000ms under load
  • Dashboard loads progressively (skeleton + async data)
  • Large lists are paginated

Metadata

Metadata

Assignees

Labels

SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releaseperformancePerformance related items

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions