Skip to content

KEP-3633: graduate matchLabelKeys/mismatchLabelKeys to beta#4450

Merged
k8s-ci-robot merged 3 commits intokubernetes:masterfrom
sanposhiho:matchlabelkeys
Feb 8, 2024
Merged

KEP-3633: graduate matchLabelKeys/mismatchLabelKeys to beta#4450
k8s-ci-robot merged 3 commits intokubernetes:masterfrom
sanposhiho:matchlabelkeys

Conversation

@sanposhiho
Copy link
Copy Markdown
Member

  • One-line PR description: fill the sections need to graduate matchLabelKeys/mismatchLabelKeys to beta.
  • Other comments:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 28, 2024
@k8s-ci-robot k8s-ci-robot requested a review from ahg-g January 28, 2024 04:12
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Jan 28, 2024
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 28, 2024
Comment thread keps/prod-readiness/sig-scheduling/3633.yaml
@sanposhiho
Copy link
Copy Markdown
Member Author

@wojtek-t wojtek-t self-assigned this Jan 31, 2024
Copy link
Copy Markdown
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just a few nits.

Comment thread keps/sig-scheduling/3633-matchlabelkeys-to-podaffinity/README.md
Comment thread keps/sig-scheduling/3633-matchlabelkeys-to-podaffinity/README.md
Comment thread keps/sig-scheduling/3633-matchlabelkeys-to-podaffinity/README.md Outdated
@sanposhiho
Copy link
Copy Markdown
Member Author

@alculquicondor Fixed, thanks.

Comment on lines -500 to -502
Also, we should make sure this feature brings no significant performance degradation in both Filter and Score.

- `k8s.io/kubernetes/test/integration/scheduler_perf/scheduler_perf_test.go`: https://storage.googleapis.com/k8s-triage/index.html?test=BenchmarkPerfScheduling
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we change the way to implement this feature, this feature no longer has an impact on the scheduling latency since this feature just modifies labelSelector and doesn't change anything in the scheduling flow. So, we don't implement any new test case in scheduler_perf for this feature.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify for the PRR: this is already implemented with no impact to scheduling.

Copy link
Copy Markdown
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm
ping @wojtek-t

Comment on lines -500 to -502
Also, we should make sure this feature brings no significant performance degradation in both Filter and Score.

- `k8s.io/kubernetes/test/integration/scheduler_perf/scheduler_perf_test.go`: https://storage.googleapis.com/k8s-triage/index.html?test=BenchmarkPerfScheduling
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify for the PRR: this is already implemented with no impact to scheduling.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 5, 2024
Copy link
Copy Markdown
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments from the PRR pov - PTAL

will rollout across nodes.
-->

It shouldn't impact already running workloads. It's an opt-in feature,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't comment on unchanged lines, so commenting here:

Are there any tests for feature enablement/disablement?

Were those added?
In particular the comment from the template is relevant here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No... I'll add the one within this release, fortunately it's not too late like minDomains 😓

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtek-t By the way, do we have to have a manual test (Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?) then? Is there any expectation difference between this automated enablement/disablement test and the manual test?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manual test should be performed as well. You can put the intent of the test in the KEP now and send an update later indicating that you executed it.

- A spike on metric `schedule_attempts_total{result="error|unschedulable"}` when pods using this feature are added.

No need to check latency of the scheduler because the scheduler doesn't get changed at all for this feature.
The only possibility is the bug in the Pod creation process in kube-apiserver and it results in some unintended scheduling.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really true? The computations on scheduler side might actually be slightly more expensive no? [e.g. a bit larger selectors to compute against pods...]
@alculquicondor - thoughts?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe... In some scenarios the users might be migrating from manually setting labels to using the new automated way.

Copy link
Copy Markdown
Member Author

@sanposhiho sanposhiho Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The computations on scheduler side might actually be slightly more expensive no?
a bit larger selectors to compute against pods

This is correct though, is it realistic that worth being added here?: the labelselector is added by matchLabelKeys → the time the scheduler takes to calculate the labelselector becomes too big?
If it happens, it's likely not the problem of this feature, but the problem of the entire PodAffinity that labelSelector calculation is somehow buggy.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's not worth adding, but also please don't mislead the reader that "latency of scheduler doesn't get changed" - it is implicitly changed by using its features more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the mention of the scheduler's latency, but with a note that we could care less about that.

12. restart kube-apiserver with this feature enabled. (**Second enablement**)
13. No change in Pods created before.
14. create one Pod with `matchLabelKeys` in PodAffinity.
15. `labelSelector` in PodAffinity should be changed based on `matchLabelKeys` and label value in the Pod because the feature is enabled.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenarios is fine - please report to the KEP back after running it [doesn't block the KEP merge, but please block the actualy graduation on it]

@wojtek-t
Copy link
Copy Markdown
Member

wojtek-t commented Feb 8, 2024

@sanposhiho - can you please address remaining comments? the feature freeze is <24h away

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2024
@sanposhiho
Copy link
Copy Markdown
Member Author

sanposhiho commented Feb 8, 2024

@wojtek-t Sorry for the delay, I couldn't have time before.
I modified sections based on the discussion.

@wojtek-t
Copy link
Copy Markdown
Member

wojtek-t commented Feb 8, 2024

/lgtm
/approve PRR

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2024
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, sanposhiho, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 8, 2024
@k8s-ci-robot k8s-ci-robot merged commit 57f8ebd into kubernetes:master Feb 8, 2024
@k8s-ci-robot k8s-ci-robot added this to the v1.30 milestone Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants