perf: reduce reconcile CPU on the 1000-pod scale test#545
Merged
Conversation
6ada630 to
f0dd6da
Compare
6c80bca to
8368b09
Compare
shayasoolin
reviewed
Apr 27, 2026
8ce635d to
3cd6db8
Compare
shayasoolin
previously approved these changes
Apr 27, 2026
3cd6db8 to
3638551
Compare
3638551 to
a772930
Compare
a772930 to
48fb64d
Compare
shmuel-runai
left a comment
Contributor
There was a problem hiding this comment.
Great work
Fix it, Ship it
Measured on the 500-replica / 1000-pod scale test (k3d + 100 KWOK nodes,
operator concurrency 20). Wall-clock −10-13% on a fresh run, deploy CPU
−21% on prior runs, GetPodCliqueSet −97%, GetPCLQPods −66%. Per-call
cost in the PodClique update path roughly halved by the cache + label
changes.
Changes:
* Parallel component sync in PCS reconcileSpec: three dependency-ordered
groups (RBAC+infra, PodClique, PCSG+PodGang) instead of twelve
sequential Syncs. Benefits the delete phase where component cleanups
can run in parallel.
* GetPCLQPods narrowed to MatchingLabels{LabelPodClique} — that label is
unique cluster-wide, so the parent managed-by / part-of labels only
added per-pod labels.Set.Lookup work for no filtering benefit.
Ownership is still validated by metav1.IsControlledBy.
* GetPCLQPods and GetPodCliqueSet ctx-memoized for the duration of one
reconcile (WithPCLQPodsCache, WithPodCliqueSetCache). reconcileSpec
and reconcileStatus share one informer list; GetPodCliqueSet drops
from four calls per PodClique reconcile (and three per PCSG reconcile)
to one.
* No-op status skip in podclique/pcs/pcsg reconcileStatus: snapshot the
status before mutate* calls, skip the API round-trip when
equality.Semantic.DeepEqual says nothing changed.
Test additions:
* TimerCondition in operator/e2e/measurement/condition/timer.go —
holds the measurement window open for a fixed duration after a trigger
phase fires.
* WorkloadManager.TriggerPCSReconcile — bumps grove.io/reconcile-trigger
annotation to force a no-op reconcile cycle without changing the spec.
* steady-state-reconcile phase in operator/e2e/tests/scale/scale_test.go
exercises the cache-hit path so the reconcile cost can be measured in
isolation.
Scale test (1 PCS × 500 replicas × 2 pods = 1000 pods):
Baseline total: 128.0s This branch: 110.9s (−13%)
pcs-deleted: 50.0s 37.9s (−24%)
deploy CPU total: 14.98s ≈10s (−30%)
GetPCLQPods: 4.57s 1.54s (−66%)
GetPodCliqueSet: ≈1–2s 60ms (−97%)
PCS PodClique.doCreateOrUpdate (steady-state): 260ms → 130ms (−50%)
Earlier revisions of this branch added a per-PodClique / per-PCSG
spec-hash short-circuit guarded by a grove.io/spec-hash annotation. That
optimization was dropped after CR feedback flagged the hash as fragile
(MNNVL exclusion, drift risk if buildResource changes silently). The
remaining changes alone capture roughly the same wall-clock improvement
on this benchmark; the steady-state CPU regresses ~10ms/sec versus the
short-circuited variant but stays ~50% under baseline.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
48fb64d to
9275b81
Compare
shmuel-runai
approved these changes
Apr 28, 2026
shayasoolin
approved these changes
Apr 28, 2026
9 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Solves #408
Six optimizations in the PCS / PCSG / PodClique reconcile paths. Measured on the 500-replica / 1000-pod scale test (k3d + 100 KWOK nodes, operator concurrency=20).
Wall-clock
CPU — deploy phase (full pprof, 500 PodCliques + 500 PodGangs being created)
GetPCLQPods(list + label-match)reconcileSpecreconcileStatussyncPCLQResourcesGetPodCliqueSetCPU — steady-state phase (30s window, single no-op reconcile triggered by annotation patch)
The common production case: a watch event fires but nothing actually changed.
controllerutil.CreateOrPatch(PCS path)doCreateOrUpdateReconcileA note on wall-clock vs. CPU
The wall-clock reduction is somewhat smaller than the CPU reduction because this test is not CPU-bound: the operator spends most of its wall-clock time waiting on kube-apiserver latency, informer cache syncs, and KWOK's own pod-scheduling delays. Those don't shrink just because our client-side code got faster. Where the CPU savings pay off in production:
All unit tests pass. No semantic changes to reconcile behavior (behavior is asserted via
equality.Semantic.DeepEqualbefore skipping status writes, ownership still validated bymetav1.IsControlledBy, etc.).What's in the PR
Parallel component sync — three dependency-ordered groups in PCS
reconcilespec.go:Instead of 12 sequential Syncs. Mostly benefits the delete phase (delete wall-clock −24%) where cleanup tasks overlap.
GetPCLQPodsnarrowed selector —MatchingLabels{LabelPodClique}only. That label is unique cluster-wide, so the parent managed-by/part-of labels only added per-podlabels.Set.Lookupwork for no filtering benefit. Ownership is still validated bymetav1.IsControlledBy.ctx-scoped memoization —
WithPCLQPodsCacheandWithPodCliqueSetCacheininternal/controller/common/component/utils/. PodClique's reconcileSpec and reconcileStatus share one pod list;GetPodCliqueSetdrops from 4 calls per PodClique reconcile (and 3 per PCSG reconcile) to 1.No-op status skip — in
podclique/pcs/pcsg reconcileStatus, snapshot status before mutations, skip the API round-trip whenequality.Semantic.DeepEqualsays nothing changed.Test additions
To make the steady-state CPU measurable in a scale test:
operator/e2e/measurement/condition/timer.go—TimerConditionholds the measurement window open for a fixed duration after a trigger phase fires.WorkloadManager.TriggerPCSReconcileinoperator/e2e/grove/workload/workload.go— bumpsgrove.io/reconcile-triggerannotation to force a no-op reconcile cycle without touching the spec.steady-state-reconcilephase inoperator/e2e/tests/scale/scale_test.go— patches the annotation, waits 30s, and pprof captured during that window isolates the cache-hit reconcile cost.🤖 Generated with Claude Code