Skip to content

Fixes for topology aware scheduling validation webhook#324

Merged
unmarshall merged 1 commit into
ai-dynamo:mainfrom
unmarshall:tas-validation-fix
Jan 16, 2026
Merged

Fixes for topology aware scheduling validation webhook#324
unmarshall merged 1 commit into
ai-dynamo:mainfrom
unmarshall:tas-validation-fix

Conversation

@unmarshall

@unmarshall unmarshall commented Jan 15, 2026

Copy link
Copy Markdown
Collaborator

What type of PR is this?

/kind bug

What this PR does / why we need it:

When TAS is disabled then it validates the PCS to ensure that it does not have any TopologyConstraint set. This should only be done when a new PCS is created. Consider an existing PCS with TopologyConstraint set at one or more levels and it has been deployed when TAS was enabled. Now if the admin disabled TAS and restarts Grove, and if the PCS has some changes to its spec then validating webhook disallows it because it runs the validations to ensure that TopologyConstraints should not be defined for a PCS when TAS is disabled. While this is fine for newly create PCS it should not be skipped for PCS that is already running.

This PR fixes this behavior.

Additionally we also expose ErrorMatcher that was introduced in #317 and use it in webhook/admission/pcs/validation/podcliqueset_test.go since the return type was changed for pcsValidator.validate to return field.ErrorList.

Which issue(s) this PR fixes:

Fixes #325

Special notes for your reviewer:

Does this PR introduce a API change?

NONE

Additional documentation e.g., enhancement proposals, usage docs, etc.:


renormalize
renormalize previously approved these changes Jan 15, 2026

@renormalize renormalize left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for addressing #317 (comment).

@unmarshall unmarshall force-pushed the tas-validation-fix branch 2 times, most recently from 783f271 to 23274ff Compare January 15, 2026 16:24
@renormalize renormalize changed the title Fixes for topology aware scheduling validation webhoook Fixes for topology aware scheduling validation webhook Jan 15, 2026
Comment thread operator/internal/webhook/admission/pcs/validation/podcliqueset.go Outdated
Comment thread operator/internal/webhook/admission/pcs/validation/podcliqueset.go Outdated
@gflarity

Copy link
Copy Markdown
Contributor

Didn't request changes to keep things unblocked, but looks like there's a couple bugs in the validation code.

Comment thread operator/internal/webhook/admission/pcs/validation/podcliqueset.go Outdated
* Expose `ErrorMatcher` in `testutils`.
* Adapted PCS validation unit tests to use the new error matcher.
* Added count=1 for test-e2e make target to ensure that no tests are cached

Signed-off-by: Madhav Bhargava <madhav.bhargava@sap.com>
@unmarshall unmarshall merged commit 7b92d6b into ai-dynamo:main Jan 16, 2026
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disabling TAS unintentionally disallows changes to existing PodCliqueSets.

4 participants