Skip to content

Conversation

@sohankunkerkar
Copy link
Member

@sohankunkerkar sohankunkerkar commented Dec 10, 2025

This is a manual cherry-pick of #8043

/assign @sohankunkerkar

TAS: extend the information in condition messages and events about nodes excluded from calculating the
assignment due to various recognized reasons like: taints, node affinity, node resource constraints.

Copilot AI review requested due to automatic review settings December 10, 2025 14:45
@k8s-ci-robot k8s-ci-robot added this to the v0.14 milestone Dec 10, 2025
@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Dec 10, 2025
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 10, 2025
@netlify
Copy link

netlify bot commented Dec 10, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit e393ef1
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/69398b4813270b0007ba5ea8

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances TAS (Topology Aware Scheduling) failure messages by adding detailed exclusion statistics showing why nodes were excluded during scheduling. It's a manual cherry-pick of #8043 to the release-0.14 branch.

Key changes:

  • Adds ExclusionStats struct to track why nodes are excluded (taints, nodeSelector, affinity, topology domain, resources)
  • Extends CountInWithLimitingResource function to identify which resource is most constraining when pods can't fit
  • Updates TAS failure messages to include total node count and breakdown of exclusion reasons

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/cache/scheduler/tas_flavor_snapshot.go Adds ExclusionStats tracking, updates fillInCounts to record exclusions, enhances notFitMessage to include exclusion details
pkg/resources/requests.go Adds CountInWithLimitingResource function returning both count and limiting resource with alphabetical tie-breaking; refactors CountIn to use it
pkg/resources/requests_test.go Adds comprehensive tests for CountInWithLimitingResource covering tie-breaking, missing resources, and various limiting scenarios
pkg/scheduler/scheduler_tas_test.go Updates test expectations to include exclusion stats in TAS failure messages and admission event messages
pkg/cache/scheduler/tas_cache_test.go Updates test expectations for failure messages and adds new tests for exclusion stats formatting with multiple exclusion reasons

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

count = int32(cap / rValue)
}
if result == nil || count < *result {
// Tie-break between CPU and memory counts to ensure deterministic results.
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment mentions "CPU and memory" specifically, but the tie-breaking logic applies to all resource types, not just CPU and memory. Consider updating the comment to be more general, e.g., "Tie-break between resources with equal counts to ensure deterministic results."

Suggested change
// Tie-break between CPU and memory counts to ensure deterministic results.
// Tie-break between resources with equal counts to ensure deterministic results.

Copilot uses AI. Check for mistakes.
}

func (s *TASFlavorSnapshot) findLevelWithFitDomains(levelIdx int, required bool, podSetSize int32, leaderPodSetSize int32, sliceSize int32, unconstrained bool) (int, []*domain, string) {
func (s *TASFlavorSnapshot) findLevelWithFitDomains(levelIdx int, required bool, podSetSize int32, leaderPodSetSize int32, sliceSize int32, unconstrained bool, stats *ExclusionStats) (int, []*domain, string) {
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature has been updated to include a stats *ExclusionStats parameter, but the caller of this function at line 799 has not been updated to pass this parameter. This will cause a compilation error. The call at line 799 should be updated to:

fitLevelIdx, currFitDomain, reason := s.findLevelWithFitDomains(levelIdx, required, count, leaderCount, sliceSize, unconstrained, stats)

Copilot uses AI. Check for mistakes.
Include detailed node exclusion reasons (taints, nodeSelector, affinity,
resources) in TAS scheduling failure messages to improve debuggability.

Fixes: kubernetes-sigs#7854

Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
@mimowo
Copy link
Contributor

mimowo commented Dec 10, 2025

/release-note-edit

TAS: extend the information in condition messages and events about nodes excluded from calculating the
assignment due to various recognized reasons like: taints, node affinity, node resource constraints.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Dec 10, 2025
@mimowo
Copy link
Contributor

mimowo commented Dec 10, 2025

/kind feature
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 10, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 10, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

DetailsGit tree hash: d53215f73b3b0d645761062efdce26beb3ec2aa1

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, sohankunkerkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 10, 2025
@k8s-ci-robot k8s-ci-robot merged commit d308491 into kubernetes-sigs:release-0.14 Dec 10, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants