Skip to content

fix: propagate PodCliqueSet annotations to PodGang#573

Merged
danbar2 merged 1 commit into
ai-dynamo:mainfrom
danbar2:podgang-merge-metadata
May 10, 2026
Merged

fix: propagate PodCliqueSet annotations to PodGang#573
danbar2 merged 1 commit into
ai-dynamo:mainfrom
danbar2:podgang-merge-metadata

Conversation

@danbar2

@danbar2 danbar2 commented May 3, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

Rebased and review-addressed continuation of #500 (original author: @w1t, with feedback from @Ronkahn21 and @gflarity). #500 went stale after main moved on; opening a fresh PR from a feature branch since the original was filed from the author's fork's main, which prevented an in-place rebase.

Problem

When a PodCliqueSet carries scheduler annotations such as nvidia.com/kai-scheduler-queue, the resulting PodGang is created without them. kai-scheduler reads queue assignment from the PodGang, so it raises QueueDoesNotExist even though the queue exists and was validated by the operator.

Additionally, buildResource was overwriting pg.Labels and pg.Annotations on every reconcile. Because buildResource runs inside controllerutil.CreateOrPatch, on update paths the PodGang arrives with existing metadata (e.g. set by admission webhooks or other controllers), and that metadata was being wiped.

Fix

In buildResource:

  • pg.Labels = lo.Assign(pg.Labels, getLabels(pcs.Name)) — preserves existing labels while adding operator-managed labels.
  • pg.Annotations = lo.Assign(pg.Annotations, pcs.Annotations, topologyAnnotations) — preserves existing annotations, propagates PCS annotations, and lets reconciliation-derived topology annotations take precedence on key collisions.

Tests

TestBuildResource covers four cases with strict map equality (so stray entries are caught):

  • create path, single PCS annotation
  • create path, multiple PCS annotations
  • update path, TAS disabled — pre-existing labels and annotations on the PodGang survive
  • update path, TAS enabled — pre-existing topology annotation is overridden by the reconciliation-derived value

Which issue(s) this PR fixes:

Fixes #490

Supersedes #500.

Does this PR introduce an API change?

NONE

@copy-pr-bot

copy-pr-bot Bot commented May 3, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@danbar2 danbar2 force-pushed the podgang-merge-metadata branch 3 times, most recently from ebffd7d to 15f5f1e Compare May 3, 2026 12:02

athreesh commented May 3, 2026

Copy link
Copy Markdown

this fixes the KAI queue issue, but are we intentionally making “all PCS annotations propagate to PodGang” part of the API contract?

for scheduler queue annotations that seems right. my only concern is the broader blast radius: PCS annotations may include user/tooling metadata that another controller or admission webhook did not expect on PodGang. should this be narrowed to known scheduler/backend annotations, or should the PR explicitly document that PodGang inherits PCS annotations and topology-derived annotations win on collisions?

@danbar2

danbar2 commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

this fixes the KAI queue issue, but are we intentionally making “all PCS annotations propagate to PodGang” part of the API contract?

for scheduler queue annotations that seems right. my only concern is the broader blast radius: PCS annotations may include user/tooling metadata that another controller or admission webhook did not expect on PodGang. should this be narrowed to known scheduler/backend annotations, or should the PR explicitly document that PodGang inherits PCS annotations and topology-derived annotations win on collisions?

IMO it should inherit the PCS annotations, cherry-picking annotations/label could never end

Comment thread operator/internal/controller/podcliqueset/components/podgang/podgang.go Outdated
@gflarity

gflarity commented May 4, 2026

Copy link
Copy Markdown
Contributor

this fixes the KAI queue issue, but are we intentionally making “all PCS annotations propagate to PodGang” part of the API contract?
for scheduler queue annotations that seems right. my only concern is the broader blast radius: PCS annotations may include user/tooling metadata that another controller or admission webhook did not expect on PodGang. should this be narrowed to known scheduler/backend annotations, or should the PR explicitly document that PodGang inherits PCS annotations and topology-derived annotations win on collisions?

IMO it should inherit the PCS annotations, cherry-picking annotations/label could never end

See review comment, I found an issue with lo.Assign, I think mirror is ok but we should protect grove.io/* annotations.

@danbar2 danbar2 force-pushed the podgang-merge-metadata branch from 15f5f1e to 4e99dd4 Compare May 5, 2026 09:01
@danbar2 danbar2 requested a review from gflarity May 5, 2026 12:51
gflarity
gflarity previously approved these changes May 9, 2026
Comment thread operator/internal/controller/podcliqueset/components/podgang/podgang.go Outdated
Comment thread operator/internal/controller/podcliqueset/components/podgang/podgang.go Outdated
Comment thread operator/internal/controller/podcliqueset/components/podgang/podgang_test.go Outdated
buildResource was overwriting PodGang labels and annotations rather
than merging into them. Because buildResource runs inside
controllerutil.CreateOrPatch, on update paths the PodGang arrives with
existing metadata (e.g. set by admission webhooks or other
controllers); the previous code wiped these.

Use lo.Assign to merge:
- pg.Labels gets existing labels merged with the operator-managed
  labels from getLabels(pcs.Name).
- pg.Annotations gets existing annotations merged with pcs.Annotations
  and the topology annotation, with reconciliation-derived values
  taking precedence on key collisions.

This propagates scheduler-queue annotations such as
nvidia.com/kai-scheduler-queue from the PodCliqueSet to the PodGang
so kai-scheduler can resolve the queue.

Tests cover both create and update paths, with TAS enabled and
disabled, and assert exact map equality so stray entries are caught.

Fixes ai-dynamo#490

Co-authored-by: Ron Kahn <122778260+Ronkahn21@users.noreply.github.com>
Co-authored-by: Gerald Flarity <gflarity@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Dan Bar <dan.bar@run.ai>
@danbar2 danbar2 merged commit e7f2ea4 into ai-dynamo:main May 10, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Can not find queue from kai-scheduler when creating an instance via dynamo

6 participants