Skip to content

[PERFORMANCE]: Admin metrics rollups empty during benchmark window (raw scans only) #1938

@crivetimihai

Description

@crivetimihai

Summary

Admin metrics rollup tables stay empty during the benchmark window (load tests < 1 hour), so admin metrics queries fall back to raw scans only. This increases DB load and admin metrics latency.

Evidence

Rollup tables during benchmark:

  • tool_metrics_hourly: 0
  • resource_metrics_hourly: 0
  • prompt_metrics_hourly: 0
  • server_metrics_hourly: 0
  • a2a_agent_metrics_hourly: 0

After manual rollup (, hours_back=24):

  • tool_metrics_hourly: 5
  • resource_metrics_hourly: 4
  • prompt_metrics_hourly: 3
  • server_metrics_hourly: 0
  • a2a_agent_metrics_hourly: 0

Rollup summary:

  • total_records_aggregated=1,569,843
  • rollups_updated=12 (created=0)
  • duration_seconds=13.12

Raw data window observed:

  • tool_metrics min_ts=2026-01-06 17:19:56
  • tool_metrics max_ts=2026-01-06 18:06:12

Note: current load test duration <1 hour, so rollups do not cover the window and combined raw+rollup queries still scan raw metrics.

Impact

  • Admin metrics pages trigger expensive raw scans.
  • Increased DB CPU and higher /admin* latency under load.

Potential fixes

  1. Ensure a rollup job runs on hour boundaries (or more frequently) during tests.
  2. Run a manual rollup before benchmarking if the test window is <1 hour.
  3. Increase metrics cache TTLs so admin UI does not recompute heavy metrics every request.

Validation

  • Rollup tables contain data for the benchmark window after an hour boundary.
  • Admin metrics queries use rollups and show reduced total time in pg_stat_statements.
  • /admin metrics endpoints show lower p95 latency under load.

Metadata

Metadata

Assignees

Labels

SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releaseperformancePerformance related itemsuiUser Interface

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions