Eliminate O(N×M) reconcile hotspots across the operator

### What you would like to be added?

Convert nine known O(N×M) loops in the Grove operator's reconcile path to O(N+M) by replacing per-element `slices.Contains` / `slices.IndexFunc` / `lo.Filter` lookups with `map[string]struct{}` (or `map[string]T`) sets built once per sync context. The patterns are all the same shape: an inner linear scan over a slice that could be a single map lookup.

The nine call sites, ordered by likely impact (Pod-cardinality first, since Pods have the highest cardinality at scale):

| # | File | Function | Current complexity | Target complexity |
|---|---|---|---|---|
| 1 | `operator/internal/controller/podcliqueset/components/podgang/syncflow.go` | `initializeAssignedAndUnassignedPodsForPCS` | `O(totalPods × expectedPodGangs)` | `O(totalPods + expectedPodGangs)` |
| 2 | `operator/internal/controller/podclique/components/pod/syncflow.go` | `checkAndRemovePodSchedulingGates` | `O(podsInPCLQ × podRefsInPodGangForPCLQ)` | `O(podsInPCLQ + podRefsInPodGangForPCLQ)` |
| 3 | `operator/internal/controller/podclique/components/pod/rollingupdate.go` | `getPodNamesPendingUpdate` | `O(oldPods × deletionExpectedPodUIDs)` | `O(oldPods + deletionExpectedPodUIDs)` |
| 4 | `operator/internal/controller/podcliquescalinggroup/components/podclique/sync.go` | `createExpectedPCLQs`, `createOrUpdatePCLQs` | `O(expectedPCLQs × existingPCLQs)` | `O(expectedPCLQs + existingPCLQs)` |
| 5 | `operator/internal/controller/podcliqueset/components/podclique/podclique.go` | `createOrUpdatePCLQs` | `O(expectedStandalonePCLQs × existingStandalonePCLQs)` | `O(expectedStandalonePCLQs + existingStandalonePCLQs)` |
| 6 | `operator/internal/controller/podcliqueset/components/podcliquescalinggroup/podcliquescalinggroup.go` | `syncPodCliqueScalingGroups` | `O(expectedPCSGs × existingPCSGs)` | `O(expectedPCSGs + existingPCSGs)` |
| 7 | `operator/internal/controller/podcliqueset/components/podgang/syncflow.go` | `getPodsForPodCliquesPendingCreation` | `O(podGangPCLQs × existingPCLQs)` | `O(podGangPCLQs + existingPCLQs)` |
| 8 | `operator/internal/controller/podcliqueset/components/podgang/syncflow.go` | `getPodGangNamesPendingCreation`, `getExcessPodGangNames` | `O(expectedPodGangs × existingPodGangs)` | `O(expectedPodGangs + existingPodGangs)` |
| 9 | `operator/internal/controller/podcliqueset/components/podgang/syncflow.go` | `getPodCliques` | `O(podGangPCLQs × existingPCLQs)` | `O(podGangPCLQs + existingPCLQs)` |

The fix shape is identical for all of them: hoist the lookup target into a `map[string]struct{}` (or `map[string]T` where the value is needed) once at the start of the sync context, then replace each `slices.Contains` / `slices.IndexFunc` with an O(1) map lookup. Where the slice is mutable, prefer `slices.DeleteFunc` over `lo.Filter` — it is in-place and stdlib.

A representative example of the same pattern, fixed recently as part of #567, lives in `operator/internal/controller/podcliqueset/reconcilestatus.go` (`computeAvailableAndUpdatedReplicas`): see `flattenNamesToSet` + `slices.DeleteFunc` for a working template.


### Why is this needed?

Grove aims to support **almost any Pods amount per PodCliqueSet**, distributable in any shape (one PodClique with 10 000 Pods, 5 000 PodCliques with 2 Pods, anything in between). The point of this issue is **not** that the operator falls over at 10 000 — it is that the *quadratic* shape of these loops makes the cost balloon long before the upper bound, and the wins from linearizing them are already meaningful at modest scale (~1 000 Pods).

### Per-reconcile comparison counts

For two representative fleet shapes, here is the worst per-PCS-reconcile work for the affected hotspots:

| Shape | N=1 000 Pods | N=2 000 Pods | N=10 000 Pods |
|---|---|---|---|
| **Pod-heavy** (1 PCLQ × N Pods) — hits hotspots 2, 3 | `~10⁶` per PCLQ sync | `~4×10⁶` | `~10⁸` |
| **PCLQ-heavy** (N/2 PCLQs × 2 Pods) — hits hotspots 1, 4, 5, 7, 8, 9 | each ~`2.5×10⁵`; aggregate ~`1.5×10⁶` | each ~`10⁶`; aggregate ~`6×10⁶` | each ~`2.5×10⁷`; aggregate ~`1.5×10⁸` |

A few observations:

- **Even at N=1 000** — well below the upper target — the PCLQ-heavy shape produces low-millions of comparisons per reconcile, plus per-element slice allocations (`lo.Map`, `lo.FilterMap`, lambda captures) on every hot path. With multi-second requeue cadences across multiple PCSes, this is already a measurable source of CPU and GC pressure; it just doesn't appear in CI/dev because dev clusters typically run at N≈10.
- **N=10 000 is where it gets ugly.** The same patterns produce 10⁸-class per-reconcile work — that is the regime where reconcile latencies climb into seconds and watch-event storms cascade across observers.
- **The fix gets cheaper as scale grows.** Each O(N²) loop becomes an O(N+M) map lookup, taking total work from `~10⁸` to `~10⁴` at N=10 000 — four orders of magnitude — for a one-shot `map` build per sync context.


#	File	Function	Current complexity	Target complexity
1	`operator/internal/controller/podcliqueset/components/podgang/syncflow.go`	`initializeAssignedAndUnassignedPodsForPCS`	`O(totalPods × expectedPodGangs)`	`O(totalPods + expectedPodGangs)`
2	`operator/internal/controller/podclique/components/pod/syncflow.go`	`checkAndRemovePodSchedulingGates`	`O(podsInPCLQ × podRefsInPodGangForPCLQ)`	`O(podsInPCLQ + podRefsInPodGangForPCLQ)`
3	`operator/internal/controller/podclique/components/pod/rollingupdate.go`	`getPodNamesPendingUpdate`	`O(oldPods × deletionExpectedPodUIDs)`	`O(oldPods + deletionExpectedPodUIDs)`
4	`operator/internal/controller/podcliquescalinggroup/components/podclique/sync.go`	`createExpectedPCLQs`, `createOrUpdatePCLQs`	`O(expectedPCLQs × existingPCLQs)`	`O(expectedPCLQs + existingPCLQs)`
5	`operator/internal/controller/podcliqueset/components/podclique/podclique.go`	`createOrUpdatePCLQs`	`O(expectedStandalonePCLQs × existingStandalonePCLQs)`	`O(expectedStandalonePCLQs + existingStandalonePCLQs)`
6	`operator/internal/controller/podcliqueset/components/podcliquescalinggroup/podcliquescalinggroup.go`	`syncPodCliqueScalingGroups`	`O(expectedPCSGs × existingPCSGs)`	`O(expectedPCSGs + existingPCSGs)`
7	`operator/internal/controller/podcliqueset/components/podgang/syncflow.go`	`getPodsForPodCliquesPendingCreation`	`O(podGangPCLQs × existingPCLQs)`	`O(podGangPCLQs + existingPCLQs)`
8	`operator/internal/controller/podcliqueset/components/podgang/syncflow.go`	`getPodGangNamesPendingCreation`, `getExcessPodGangNames`	`O(expectedPodGangs × existingPodGangs)`	`O(expectedPodGangs + existingPodGangs)`
9	`operator/internal/controller/podcliqueset/components/podgang/syncflow.go`	`getPodCliques`	`O(podGangPCLQs × existingPCLQs)`	`O(podGangPCLQs + existingPCLQs)`

Shape	N=1 000 Pods	N=2 000 Pods	N=10 000 Pods
Pod-heavy (1 PCLQ × N Pods) — hits hotspots 2, 3	`~10⁶` per PCLQ sync	`~4×10⁶`	`~10⁸`
PCLQ-heavy (N/2 PCLQs × 2 Pods) — hits hotspots 1, 4, 5, 7, 8, 9	each ~`2.5×10⁵`; aggregate ~`1.5×10⁶`	each ~`10⁶`; aggregate ~`6×10⁶`	each ~`2.5×10⁷`; aggregate ~`1.5×10⁸`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate O(N×M) reconcile hotspots across the operator #577

What you would like to be added?

Why is this needed?

Per-reconcile comparison counts

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Eliminate O(N×M) reconcile hotspots across the operator #577

Description

What you would like to be added?

Why is this needed?

Per-reconcile comparison counts

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions