Skip to content

Scheduled CI: scale-test trend tracking and comparison #550

@danbar2

Description

@danbar2

What you would like to be added?

A scheduled CI job that runs the scale test, compares the new result against the previously stored run, and persists the result to shared storage for trend tracking.

Concrete scope:

  1. Scheduled trigger. A GitHub Actions workflow (e.g. .github/workflows/scale-test.yaml) that runs all existing scale tests on a cron (plus workflow_dispatch for manual runs). Each run is tagged with commit SHA, and timestamp.

  2. Shared result storage. Each completed run uploads its scale-test-results.json (and pprof artifacts when available) to a shared backing store, indexed by commit SHA + timestamp. Open question for discussion: which backend — a dedicated branch in this repo (scale-test-history), an S3/GCS bucket, or a gh-pages site. Whichever we pick should be cheap, append-only, and queryable from CI without extra credentials beyond what GHA already has.

  3. Compare against latest. After a run finishes, the job pulls the most recent stored result for the same test, and emits a side-by-side report:

    • Per-test, per-milestone wall-clock (e.g. pods-created, pods-ready, pcs-available, pcs-deleted, total) with absolute + percent delta vs. previous.
    • Per-test, per-phase totals (e.g. deploy / steady-state / delete).
    • When pprof is captured, top-N CPU samples per controller Reconcile with deltas.
    • Output in markdown so it can be posted as a job summary and (optionally) commented on the triggering commit / PR.
  4. Trend tracking. Because every run is appended to shared storage, generating a "last N runs" trend chart for each milestone becomes a separate, cheap step (a small static page or a script that emits CSV/SVG from the stored JSONs).

Suggested implementation outline:

  • Extend operator/e2e/measurement/exporter with a CompareExporter that takes two TrackerResults and emits the markdown report + a structured diff.
  • Small CLI under operator/hack/ (scale-compare) that loads two JSON files and runs the exporter — usable both in CI and by developers locally.
  • New workflow under .github/workflows/ that wires up: scale-cluster-up → run-scale-test → fetch previous result from storage → run scale-compare → upload new result → post job summary.

Why is this needed?

Operator perf work (the recent reconcile-CPU optimizations on the 1k-pod scenario, and the upcoming 5k-pod delete QPS work) keeps running into the same problem: there is no continuous signal on scale-test performance. We only know whether a change moved the numbers when someone manually runs baseline-vs-optimized and hand-builds a comparison table from scale-test-results.json.

A scheduled CI job that runs the scale test, persists results, and compares against the previous run gives us:

  • Trend visibility. We see drift over time across many small changes, not just the deltas around a single perf-focused PR. Quiet regressions that nobody attributed to any single PR become visible.
  • A canonical history. Reviewers and design docs can link to the stored run for a given commit instead of relying on /tmp/... dirs on a developer's laptop.
  • Lower cost-per-comparison. The current manual workflow (run twice, open two JSONs, transcribe into a table) is slow enough that we don't do it for most PRs. Automating it means perf comparison becomes the default, not a special effort.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions