Scheduled CI: scale-test trend tracking and comparison

## What you would like to be added?

A scheduled CI job that runs the scale test, compares the new result against the previously stored run, and persists the result to shared storage for trend tracking.

Concrete scope:

1. **Scheduled trigger.** A GitHub Actions workflow (e.g. `.github/workflows/scale-test.yaml`) that runs all existing scale tests on a cron (plus `workflow_dispatch` for manual runs). Each run is tagged with commit SHA, and timestamp.

2. **Shared result storage.** Each completed run uploads its `scale-test-results.json` (and pprof artifacts when available) to a shared backing store, indexed by commit SHA + timestamp. Open question for discussion: which backend — a dedicated branch in this repo (`scale-test-history`), an S3/GCS bucket, or a gh-pages site. Whichever we pick should be cheap, append-only, and queryable from CI without extra credentials beyond what GHA already has.

3. **Compare against latest.** After a run finishes, the job pulls the most recent stored result for the same test, and emits a side-by-side report:
   - Per-test, per-milestone wall-clock (e.g. `pods-created`, `pods-ready`, `pcs-available`, `pcs-deleted`, total) with absolute + percent delta vs. previous.
   - Per-test, per-phase totals (e.g. deploy / steady-state / delete).
   - When pprof is captured, top-N CPU samples per controller `Reconcile` with deltas.
   - Output in markdown so it can be posted as a job summary and (optionally) commented on the triggering commit / PR.

4. **Trend tracking.** Because every run is appended to shared storage, generating a "last N runs" trend chart for each milestone becomes a separate, cheap step (a small static page or a script that emits CSV/SVG from the stored JSONs).

Suggested implementation outline:
- Extend `operator/e2e/measurement/exporter` with a `CompareExporter` that takes two `TrackerResult`s and emits the markdown report + a structured diff.
- Small CLI under `operator/hack/` (`scale-compare`) that loads two JSON files and runs the exporter — usable both in CI and by developers locally.
- New workflow under `.github/workflows/` that wires up: scale-cluster-up → run-scale-test → fetch previous result from storage → run `scale-compare` → upload new result → post job summary.

## Why is this needed?

Operator perf work (the recent reconcile-CPU optimizations on the 1k-pod scenario, and the upcoming 5k-pod delete QPS work) keeps running into the same problem: there is no continuous signal on scale-test performance. We only know whether a change moved the numbers when someone manually runs baseline-vs-optimized and hand-builds a comparison table from `scale-test-results.json`.

A scheduled CI job that runs the scale test, persists results, and compares against the previous run gives us:

- **Trend visibility.** We see drift over time across many small changes, not just the deltas around a single perf-focused PR. Quiet regressions that nobody attributed to any single PR become visible.
- **A canonical history.** Reviewers and design docs can link to the stored run for a given commit instead of relying on `/tmp/...` dirs on a developer's laptop.
- **Lower cost-per-comparison.** The current manual workflow (run twice, open two JSONs, transcribe into a table) is slow enough that we don't do it for most PRs. Automating it means perf comparison becomes the default, not a special effort.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduled CI: scale-test trend tracking and comparison #550

What you would like to be added?

Why is this needed?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Scheduled CI: scale-test trend tracking and comparison #550

Description

What you would like to be added?

Why is this needed?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions