Skip to content

DRA: Prioritized Alternatives in Device Requests, II#130622

Merged
k8s-ci-robot merged 2 commits intokubernetes:masterfrom
pohly:DRAPrioritizedList
Mar 10, 2025
Merged

DRA: Prioritized Alternatives in Device Requests, II#130622
k8s-ci-robot merged 2 commits intokubernetes:masterfrom
pohly:DRAPrioritizedList

Conversation

@pohly
Copy link
Copy Markdown
Contributor

@pohly pohly commented Mar 6, 2025

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This is a follow-up to address some review comments in #128586:

Which issue(s) this PR fixes:

Related-to: kubernetes/enhancements#4816

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE because this is not a functional changed compared to 1.32. Listing the feature once is enough.

NONE

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/code-generation area/test kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Mar 6, 2025
@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Mar 6, 2025
@github-project-automation github-project-automation Bot moved this to Needs Triage in SIG Apps Mar 6, 2025
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. wg/device-management Categorizes an issue or PR as relevant to WG Device Management. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 6, 2025
@k8s-ci-robot k8s-ci-robot requested review from AxeZhan and bart0sh March 6, 2025 18:59
…and used

This was previously caught during Filter by the allocator check. Doing it
sooner avoids wasting resources on a pod which ultimately cannot get scheduled.

While at it, be a bit more clear about which feature is disabled. The user
might not know that.
@pohly pohly force-pushed the DRAPrioritizedList branch from 5b511b9 to 225d8bb Compare March 7, 2025 07:45
return nil, status
}
} else {
if !pl.enablePrioritizedList {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we reject pods with FirstAvailable at PreEnqueue additionally when PrioritizedList is disabled? which would reduce unnecessary scheduling cycles consumed for those pods.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that being said, I just remembered the discussion at #129698.
If we reject pods at PreEnqueue, no error message would be shown up for users. Hmm, so looks like this is not a great idea, at least for now, until we've got a solution for that problem.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be even better. Let me check that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, there was a reason (hadn't seen your second message when writing my initial reply).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it would be nice to be in PreEnqueue performance-wise though. Anyways, we can revisit all those PreEnqueue stuff later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This situation is going to be extremely rare (enable feature, create claim, disable feature) and temporary (will disappear as the feature matures, hopefully in a few releases). It's good to catch it when it's just one if check in some existing code, but I am not sure whether it would be worth more complex changes in PreEnqueue.

I was about to write that when I saw that you had already retracted the suggestion. 😄

Copy link
Copy Markdown
Member

@sanposhiho sanposhiho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Leave /lgtm for others. @macsko @dom4ha

@sanposhiho
Copy link
Copy Markdown
Member

/retest

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 9, 2025
@sanposhiho
Copy link
Copy Markdown
Member

/remove-kind api-change
I guess?

@k8s-ci-robot k8s-ci-robot removed the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Mar 9, 2025
@pohly pohly force-pushed the DRAPrioritizedList branch from 225d8bb to ae4543f Compare March 10, 2025 07:22
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 10, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 10, 2025
@pohly
Copy link
Copy Markdown
Contributor Author

pohly commented Mar 10, 2025

/remove-kind api-change
I guess?

Yes. That was a leftover from having included the full implementation earlier before rebasing.

@dom4ha LGTM?

Comment thread test/integration/dra/dra_test.go Outdated
claim := claimPrioritizedList.DeepCopy()
claim.Namespace = namespace
claim, err = tCtx.Client().ResourceV1beta1().ResourceClaims(namespace).Create(tCtx, claim, metav1.CreateOptions{})
if enabled {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reduce the indentation by flipping the logic:

Suggested change
if enabled {
if !enabled {
require.Error(tCtx, err, "claim should have become invalid after dropping FirstAvailable")
return
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Fixed, and also the unit test failure in the allocation that was caused by the modified error text.

This adds dedicated integration tests for the feature to the general
test/integration/dra for the API and some minimal testing with the scheduler.

It also adds non-performance test cases for scheduler_perf because that is a
better place for running through the complete flow (for example, can reuse
infrastructure for setting up nodes).
@pohly pohly force-pushed the DRAPrioritizedList branch from ae4543f to 89440b1 Compare March 10, 2025 10:38
Copy link
Copy Markdown
Member

@dom4ha dom4ha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 10, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: e0070d417ffb22b670c09ae3109164e3248c7e74

@macsko
Copy link
Copy Markdown
Member

macsko commented Mar 10, 2025

/lgtm

@pohly
Copy link
Copy Markdown
Contributor Author

pohly commented Mar 10, 2025

/test pull-kubernetes-e2e-kind

Pod scheduling timeout.

@k8s-ci-robot k8s-ci-robot merged commit 4806519 into kubernetes:master Mar 10, 2025
@github-project-automation github-project-automation Bot moved this from Needs Triage to Done in SIG Apps Mar 10, 2025
@k8s-ci-robot k8s-ci-robot added this to the v1.33 milestone Mar 10, 2025
@github-project-automation github-project-automation Bot moved this from Triage to Done in SIG Node CI/Test Board Mar 10, 2025
@aaron-prindle
Copy link
Copy Markdown
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 13, 2025
@pohly pohly moved this from 🆕 New to ✅ Done in Dynamic Resource Allocation Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/code-generation area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.

Projects

Status: ✅ Done
Archived in project
Archived in project

Development

Successfully merging this pull request may close these issues.

8 participants