What you would like to be added?
A scheduled CI job that runs the scale test, compares the new result against the previously stored run, and persists the result to shared storage for trend tracking.
Concrete scope:
-
Scheduled trigger. A GitHub Actions workflow (e.g. .github/workflows/scale-test.yaml) that runs all existing scale tests on a cron (plus workflow_dispatch for manual runs). Each run is tagged with commit SHA, and timestamp.
-
Shared result storage. Each completed run uploads its scale-test-results.json (and pprof artifacts when available) to a shared backing store, indexed by commit SHA + timestamp. Open question for discussion: which backend — a dedicated branch in this repo (scale-test-history), an S3/GCS bucket, or a gh-pages site. Whichever we pick should be cheap, append-only, and queryable from CI without extra credentials beyond what GHA already has.
-
Compare against latest. After a run finishes, the job pulls the most recent stored result for the same test, and emits a side-by-side report:
- Per-test, per-milestone wall-clock (e.g.
pods-created, pods-ready, pcs-available, pcs-deleted, total) with absolute + percent delta vs. previous.
- Per-test, per-phase totals (e.g. deploy / steady-state / delete).
- When pprof is captured, top-N CPU samples per controller
Reconcile with deltas.
- Output in markdown so it can be posted as a job summary and (optionally) commented on the triggering commit / PR.
-
Trend tracking. Because every run is appended to shared storage, generating a "last N runs" trend chart for each milestone becomes a separate, cheap step (a small static page or a script that emits CSV/SVG from the stored JSONs).
Suggested implementation outline:
- Extend
operator/e2e/measurement/exporter with a CompareExporter that takes two TrackerResults and emits the markdown report + a structured diff.
- Small CLI under
operator/hack/ (scale-compare) that loads two JSON files and runs the exporter — usable both in CI and by developers locally.
- New workflow under
.github/workflows/ that wires up: scale-cluster-up → run-scale-test → fetch previous result from storage → run scale-compare → upload new result → post job summary.
Why is this needed?
Operator perf work (the recent reconcile-CPU optimizations on the 1k-pod scenario, and the upcoming 5k-pod delete QPS work) keeps running into the same problem: there is no continuous signal on scale-test performance. We only know whether a change moved the numbers when someone manually runs baseline-vs-optimized and hand-builds a comparison table from scale-test-results.json.
A scheduled CI job that runs the scale test, persists results, and compares against the previous run gives us:
- Trend visibility. We see drift over time across many small changes, not just the deltas around a single perf-focused PR. Quiet regressions that nobody attributed to any single PR become visible.
- A canonical history. Reviewers and design docs can link to the stored run for a given commit instead of relying on
/tmp/... dirs on a developer's laptop.
- Lower cost-per-comparison. The current manual workflow (run twice, open two JSONs, transcribe into a table) is slow enough that we don't do it for most PRs. Automating it means perf comparison becomes the default, not a special effort.
What you would like to be added?
A scheduled CI job that runs the scale test, compares the new result against the previously stored run, and persists the result to shared storage for trend tracking.
Concrete scope:
Scheduled trigger. A GitHub Actions workflow (e.g.
.github/workflows/scale-test.yaml) that runs all existing scale tests on a cron (plusworkflow_dispatchfor manual runs). Each run is tagged with commit SHA, and timestamp.Shared result storage. Each completed run uploads its
scale-test-results.json(and pprof artifacts when available) to a shared backing store, indexed by commit SHA + timestamp. Open question for discussion: which backend — a dedicated branch in this repo (scale-test-history), an S3/GCS bucket, or a gh-pages site. Whichever we pick should be cheap, append-only, and queryable from CI without extra credentials beyond what GHA already has.Compare against latest. After a run finishes, the job pulls the most recent stored result for the same test, and emits a side-by-side report:
pods-created,pods-ready,pcs-available,pcs-deleted, total) with absolute + percent delta vs. previous.Reconcilewith deltas.Trend tracking. Because every run is appended to shared storage, generating a "last N runs" trend chart for each milestone becomes a separate, cheap step (a small static page or a script that emits CSV/SVG from the stored JSONs).
Suggested implementation outline:
operator/e2e/measurement/exporterwith aCompareExporterthat takes twoTrackerResults and emits the markdown report + a structured diff.operator/hack/(scale-compare) that loads two JSON files and runs the exporter — usable both in CI and by developers locally..github/workflows/that wires up: scale-cluster-up → run-scale-test → fetch previous result from storage → runscale-compare→ upload new result → post job summary.Why is this needed?
Operator perf work (the recent reconcile-CPU optimizations on the 1k-pod scenario, and the upcoming 5k-pod delete QPS work) keeps running into the same problem: there is no continuous signal on scale-test performance. We only know whether a change moved the numbers when someone manually runs baseline-vs-optimized and hand-builds a comparison table from
scale-test-results.json.A scheduled CI job that runs the scale test, persists results, and compares against the previous run gives us:
/tmp/...dirs on a developer's laptop.