Skip to content

make linear complexity instead of embedded loops#579

Merged
oleg-kushniriov merged 1 commit into
ai-dynamo:mainfrom
oleg-kushniriov:perf/linearize-reconcile-loops
May 11, 2026
Merged

make linear complexity instead of embedded loops#579
oleg-kushniriov merged 1 commit into
ai-dynamo:mainfrom
oleg-kushniriov:perf/linearize-reconcile-loops

Conversation

@oleg-kushniriov

Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Linearizes nine O(N×M) loops in the operator's reconcile path by replacing per-element slices.Contains / slices.IndexFunc / lo.Filter-with-inner-Contains lookups with map[string]struct{} (or map[string]T) sets/lookups built once per sync context. Same shape across all
nine sites; no behavior change.

Affected call sites (each was O(M·E), is now O(M+E)):

Hotspot File Function
1 controller/podcliqueset/components/podgang/syncflow.go initializeAssignedAndUnassignedPodsForPCS
2 controller/podclique/components/pod/syncflow.go checkAndRemovePodSchedulingGates
3 controller/podclique/components/pod/rollingupdate.go getPodNamesPendingUpdate
4 controller/podcliquescalinggroup/components/podclique/sync.go createExpectedPCLQs, createOrUpdatePCLQs
5 controller/podcliqueset/components/podclique/podclique.go createOrUpdatePCLQs
6 controller/podcliqueset/components/podcliquescalinggroup/podcliquescalinggroup.go Sync (existence check + excess filter)
nine sites; no behavior change.

Affected call sites (each was O(M·E), is now O(M+E)):

Hotspot File Function
1 controller/podcliqueset/components/podgang/syncflow.go initializeAssignedAndUnassignedPodsForPCS
2 controller/podclique/components/pod/syncflow.go checkAndRemovePodSchedulingGates
3 controller/podclique/components/pod/rollingupdate.go getPodNamesPendingUpdate
4 controller/podcliquescalinggroup/components/podclique/sync.go createExpectedPCLQs, createOrUpdatePCLQs
5 controller/podcliqueset/components/podclique/podclique.go createOrUpdatePCLQs
6 controller/podcliqueset/components/podcliquescalinggroup/podcliquescalinggroup.go Sync (existence check + excess filter)
7 controller/podcliqueset/components/podgang/syncflow.go getPodsForPodCliquesPendingCreation
8 controller/podcliqueset/components/podgang/syncflow.go getPodGangNamesPendingCreation, getExcessPodGangNames
9 controller/podcliqueset/components/podgang/syncflow.go getPodCliques, determinePodCliqueReplicas, determinePCSGReplicas

Introduces a small generic utility at internal/controller/common/component/utils/:

  • sets.goSet[T] / NewSet / NewSetBy / Has for membership-only lookups; MapBy returning map[K]V for value-retrieval lookups.
  • lookups.go — typed convenience helpers (PodCliqueByName, PodCliqueNameSet, PCSGByName, PodGangByName) over Grove's common resource slices, since these views are needed in multiple controllers.

The fix matters at modest scale, not just at the upper bound: at ~1 000 Pods (well below Grove's ~10 000-Pod target) the PCLQ-heavy shape produces low-millions of comparisons per reconcile plus per-element slice allocations on every hot path. CPU and GC pressure don't show in
dev/CI because dev clusters typically run at N≈10. At N=10 000 these patterns produce 10⁸-class per-reconcile work — the regime where reconcile latencies climb to seconds and watch-event storms cascade.

Which issue(s) this PR fixes:

Fixes #577

Special notes for your reviewer:

  • Mechanical refactor. Each hotspot is the same pattern: hoist a map[string]struct{} (or map[string]T) out of the inner loop, do a single map lookup per element. No semantic changes.
  • Eager init, no lazy fallback. The *ByName / *NameSet fields on syncContext are populated up front in prepareSyncFlow (or prepareSyncContext). I deliberately did not add lazy-init getters — they would either be dead code (callers always go through
    prepareSyncFlow) or a latent data race (if syncContext is ever shared across goroutines). Two syncflow_test.go fixtures were updated to populate the new fields explicitly using the hoisted helpers — this is exactly the kind of mismatch the eager-only design is meant to
    surface.
  • Set[T] keeps .Has, map[K]V doesn't get a wrapper. Set[T] uses a sentinel struct{} value, so the Has method is genuinely useful. For map[K]V lookups, the natural _, ok := m[k] idiom is used everywhere — no Lookup[K, V] alias.
  • Pointer aliasing. initializeAssignedAndUnassignedPodsForPCS relies on expectedPodGangs []*podGangInfo being a slice of pointers — the expectedPodGangByName map values alias the same structs, so pgi.refreshAssociatedPCLQPods(...) mutates back into the slice. There's an
    inline comment documenting this; a future refactor that switches to value-typed entries would silently break it.
  • Companion scale test. Test_ScaleTest_1000_PodHeavy (1 PCS × 1 PCLQ × 1 000 pods) added to e2e/tests/scale/. Companion to the existing Test_ScaleTest_1000 (500 PCLQs × 2 pods, PCLQ-heavy). The two together cover both hotspot axes: PCLQ-heavy hits hotspots 1, 4, 5, 7, 8,
    9; Pod-heavy hits 2, 3. Run with make run-scale-test TEST_PATTERN=Test_ScaleTest_1000_PodHeavy against make scale-cluster-up. Diff scale-test-results.json and the steady-state-reconcile pprof captures between baseline and this branch for the perf evidence.
  • Tests. sets_test.go covers NewSet (empty/populated/dups), NewSetBy (keyFunc + dedup), Set.Has (incl. zero-value), MapBy (empty/populated/last-write-wins). All existing controller tests pass. Full unit-test suite + go vet (default + -tags=e2e) + make lint
    clean.
  • Suggested review focus. internal/controller/common/component/utils/sets.go, lookups.go (the new abstraction), and controller/podcliqueset/components/podgang/syncflow.go (largest delta — 4 hotspots fixed in one file, syncContext shape changed). The rest are 5–10-line
    mechanical applications of the same pattern.

Does this PR introduce a API change?

NONE

Additional documentation e.g., enhancement proposals, usage docs, etc.:

@copy-pr-bot

copy-pr-bot Bot commented May 4, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@oleg-kushniriov oleg-kushniriov self-assigned this May 4, 2026
@oleg-kushniriov oleg-kushniriov force-pushed the perf/linearize-reconcile-loops branch from 85f522c to 73748d8 Compare May 4, 2026 13:36
@oleg-kushniriov oleg-kushniriov changed the title make linear coplexity instead of embedded loops make linear complexity instead of embedded loops May 5, 2026
@oleg-kushniriov oleg-kushniriov marked this pull request as ready for review May 5, 2026 09:51
@oleg-kushniriov oleg-kushniriov force-pushed the perf/linearize-reconcile-loops branch from 73748d8 to 5b185a7 Compare May 10, 2026 11:49
@oleg-kushniriov oleg-kushniriov force-pushed the perf/linearize-reconcile-loops branch from 5b185a7 to 6fbc871 Compare May 11, 2026 08:21
@oleg-kushniriov oleg-kushniriov merged commit c101a06 into ai-dynamo:main May 11, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eliminate O(N×M) reconcile hotspots across the operator

3 participants