KEP-2625: Promote the CPUManager Policy Option `full-pcpus-only` to GA by ffromani · Pull Request #5108 · kubernetes/enhancements

ffromani · 2025-01-30T12:05:33Z

One-line PR description: CPU Manager Policy option full-pcpus-only 1.33 GA
Issue link: node: cpumanager: add options to reject non SMT-aligned workload #2625
Other comments:
- Introduced as an alpha feature in 1.22 node: cpumanager: add options to reject non SMT-aligned workload kubernetes#101432
- Graduated as beta feature in 1.23 KEP-2625: Update CPU Manager Policy Options 1.23 Beta #2933
- The GA graduation would mean:
  - the full-pcpus-only option will graduate to GA
  - the CPUManagerPolicyOptions master FG will be locked to true, and removed 3 versions later

ffromani · 2025-01-30T12:05:42Z

/sig node

cosmetic changes, no changes in content besides the bare minimum neededl to adjust to the new template. Signed-off-by: Francesco Romani <fromani@redhat.com>

catch up with the updates since beta graduation Signed-off-by: Francesco Romani <fromani@redhat.com>

kannon92 · 2025-01-31T15:04:52Z

/retitle KEP-2625: Promote CPUManager Policy Options to GA

kannon92 · 2025-01-31T15:19:11Z

+##### e2e tests

-For all these reasons we postponed this work to a later date.
+TBD


/hold

We should probably have this filled out before merging.

/hold cancel

done

kannon92 · 2025-01-31T15:22:33Z

  The beta-quality options are visible by default, and the feature gate allows a positive acknowledgement that non stable features are being used, and also allows to optionally turn them off.
  Based on the graduation criteria described below, a policy option will graduate from a group to the other (alpha to beta).
-  We plan to removete the `CPUManagerPolicyAlphaOptions` and `CPUManagerPolicyBetaOptions` after all options graduated to stable, after a feature cycle passes without new planned options, and not before 1.28, to give ample time to the work in progress option to graduate at least to beta.
+  We plan to remove the `CPUManagerPolicyAlphaOptions` and `CPUManagerPolicyBetaOptions` after all options graduated to stable, after a feature cycle passes without new planned options, and not before 1.28, to give ample time to the work in progress option to graduate at least to beta.


It feels like we won't ever remove CPUManagerPolicyAlphaOptions and CPUManagerPolicyBetaOptions.

We have uncorecache that is now an alpha policy option.

Is our plan really to remove this and then add it back when people want a new CPUManagerPolicy?

Or do we want to have dedicated feature gates for each policy option now?

Fair point, I need to rephrase to convey the meaning here

We still have feature gates guarding groups of options (and totally we have a bunch of alpha options, not just uncorecache). Arguably the removal strategy of these FGs is a bit unspecified. We didn't touch this subject much after the KEP was first discussed. I'm open to suggestions here.

Yeah, we didn't explicitly discuss the removal strategy for feature gates. Removing a feature gate after a single cycle without new policy options seems a bit aggressive, but keeping feature gates indefinitely when they no longer guard any options isn’t ideal either.

Once all policy options have graduated to GA, it might be reasonable to wait at least two cycles before removing the feature gates if no new options are planned? Let's have this discussion during the PRR review of this PR.

Additionally, as new policy options become less frequent, we should revisit whether to reintroduce CPUManagerPolicyAlphaOptions and CPUManagerPolicyBetaOptions when new options are proposed after feature gates have been removed. This would help avoid feature gate proliferation - the main reason we moved away from a feature gate per policy option or determine if we should revert to the standard approach of gating each policy option individually.

I'm fine with this alternative, but the problem here is which KEP is supposed to track this work.
This KEP wanted to introduce a single new option, and we added the CPUManagerPolicyAlphaOptions and CPUManagerPolicyBetaOptions along the way, as outcome of the conversation

Good point. All policy option KEPs already enumerate the feature gates that guard them. If that is not the case, it should be rectified. So, when all options have graduated to GA, the responsibility of cleaning up these feature gates should fall on the last policy option reaching GA.

Based on our discussion in SIG Node yesterday, it would make sense to couple this with other cleanup work to be done after GA graduation. The recommended timeline here is three releases, so IMO we should follow the same approach.

I tend to agree. Either that or a new minimal KEP to remove the gates once we reached idle time.

Once PRR team has had a chance to weigh in on this, I would capture this discussion in the KEP or in the issue so that the plan is properly documented for future reference and doesn't get lost in github comments.

Yes, that makes sense. Once we are done with most options we can retire the gates. If later we add an option here or there, we can probably give them their own gates (or not, we can decide then).

Perfect, further rephrased to capture this conversation.

kannon92 · 2025-01-31T15:25:18Z

+
+N/A.
+
+###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?


I don't think these are SLIs.

https://github.com/kubernetes/community/blob/master/sig-scalability/slos/slos.md

I see what you mean, but at the same time this is what the template is suggesting https://github.com/kubernetes/enhancements/blob/master/keps/NNNN-kep-template/README.md#what-are-the-slis-service-level-indicators-an-operator-can-use-to-determine-the-health-of-the-service and I followed that. What else should be set here?

Are the "errors" here system or user error? If they are system errors, then the error rate (errors/requests) is an SLI. If they are user errors, it's not really. Latency for configuring this could be another SLI, but it is encompassed by the general Pod provisioning SLI and I don't think we need to track this separately.

Since there are no SLOs the SLIs don't mean much. But if this is a system error, we probably should use it to specify an SLO as well.

I am OK leaving these in here for documentation, regardless (or they could move to the troubleshooting section)

cpu_manager_pinning_errors_total are system errors. Meaning: the system was asked to pin CPUs because the workloads requests them, but failed to do so. cpu_manager_pinning_requests_total tracks the number of workloads (containers) managed by the kubelet which requested cpu pinning (the usual cpumanager requirement: guaranteed QoS pods, integral CPU requested).

AFAICT I agree there is no SLOs, but still these metrics may be helpful to troubleshoot nodes.

TL;DR: AFAIU I can leave this part unchanged, so I'm doing that. Please let me know if I need to change something after the clarification above.

kannon92 · 2025-01-31T15:25:43Z

+
+###### Are there any missing metrics that would be useful to have to improve observability of this feature?
+
+We can detail the pinning errors total with a new metric like `cpu_manager_errors_count` or


For GA, I would expect that this metric is there already.

Fair enough, we can add this as GA requirement.
Depends on: kubernetes/kubernetes#129529

So if it depends on that issue, then are we not promoting this to GA in this release? and adding a new metric as a new beta?

While we can wait for another cycle, adding metrics in the context of graduation is something we did repeatedly in the past and AFAIK is not controversial or uncommon.
I can also redo the work I did on 129529, it's just a bit wasteful and I'd prefer not to, but I can if this simplifies the flow.

While we can wait for another cycle, adding metrics in the context of graduation is something we did repeatedly in the past and AFAIK is not controversial or uncommon. I can also redo the work I did on 129529, it's just a bit wasteful and I'd prefer not to, but I can if this simplifies the flow.

Actually no, we don't need to depend on this. On second look the redo is minimal. Will file a new PR. Thanks for the comment!

Adding them in a separate PR.

here: kubernetes/kubernetes#129950

swatisehgal · 2025-02-04T12:35:16Z

 In order to make the resource reporting consistent, and avoiding cascading changes in the system, we enforce the request constraints at admission time.
 This approach follows what the Topology Manager already does.

-### Alternatives


Why remove the alternatives? I think we should keep them for future reference and context.

Moved at the bottom per (AFAIU) new KEP template

swatisehgal · 2025-02-04T12:57:09Z

  The beta-quality options are visible by default, and the feature gate allows a positive acknowledgement that non stable features are being used, and also allows to optionally turn them off.
  Based on the graduation criteria described below, a policy option will graduate from a group to the other (alpha to beta).
-  We plan to removete the `CPUManagerPolicyAlphaOptions` and `CPUManagerPolicyBetaOptions` after all options graduated to stable, after a feature cycle passes without new planned options, and not before 1.28, to give ample time to the work in progress option to graduate at least to beta.
+  We plan to remove the `CPUManagerPolicyAlphaOptions` and `CPUManagerPolicyBetaOptions` after all options graduated to stable, after a feature cycle passes without new planned options, and not before 1.28, to give ample time to the work in progress option to graduate at least to beta.


Yeah, we didn't explicitly discuss the removal strategy for feature gates. Removing a feature gate after a single cycle without new policy options seems a bit aggressive, but keeping feature gates indefinitely when they no longer guard any options isn’t ideal either.

Once all policy options have graduated to GA, it might be reasonable to wait at least two cycles before removing the feature gates if no new options are planned? Let's have this discussion during the PRR review of this PR.

Additionally, as new policy options become less frequent, we should revisit whether to reintroduce CPUManagerPolicyAlphaOptions and CPUManagerPolicyBetaOptions when new options are proposed after feature gates have been removed. This would help avoid feature gate proliferation - the main reason we moved away from a feature gate per policy option or determine if we should revert to the standard approach of gating each policy option individually.

swatisehgal

/lgtm

Looks good from node-perspective!

swatisehgal · 2025-02-05T17:25:42Z

  The beta-quality options are visible by default, and the feature gate allows a positive acknowledgement that non stable features are being used, and also allows to optionally turn them off.
  Based on the graduation criteria described below, a policy option will graduate from a group to the other (alpha to beta).
-  We plan to removete the `CPUManagerPolicyAlphaOptions` and `CPUManagerPolicyBetaOptions` after all options graduated to stable, after a feature cycle passes without new planned options, and not before 1.28, to give ample time to the work in progress option to graduate at least to beta.
+  We plan to remove the `CPUManagerPolicyAlphaOptions` and `CPUManagerPolicyBetaOptions` after all options graduated to stable, after a feature cycle passes without new planned options, and not before 1.28, to give ample time to the work in progress option to graduate at least to beta.


Once PRR team has had a chance to weigh in on this, I would capture this discussion in the KEP or in the issue so that the plan is properly documented for future reference and doesn't get lost in github comments.

ffromani · 2025-02-05T17:33:24Z

@swatisehgal fully agree, the outcome must be recorded in the KEP itsellf. Perhaps other sig-arch members should be involved in the convo. I captured your comments and moved in a separate section, PTAL again when you have time

swatisehgal

/lgtm

johnbelamaric

A few points in the comments I sent individually, once those updates are done, PRR should be good to go.

File missing content Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani · 2025-02-11T09:17:24Z

thanks @johnbelamaric ! I think I addressed all the review comments.

johnbelamaric

/approve
/lgtm

k8s-ci-robot · 2025-02-11T16:22:21Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, johnbelamaric, mrunalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [johnbelamaric]
~~keps/sig-node/2625-cpumanager-policies-thread-placement/OWNERS~~ [johnbelamaric,mrunalp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Add metrics to report alignment allocation failures See: kubernetes/enhancements#5108 Signed-off-by: Francesco Romani <fromani@redhat.com>

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 30, 2025

k8s-ci-robot requested review from derekwaynecarr and mrunalp January 30, 2025 12:05

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Jan 30, 2025

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 30, 2025

ffromani mentioned this pull request Jan 10, 2025

node: cpumanager: add options to reject non SMT-aligned workload #2625

Closed

12 tasks

ffromani added 2 commits January 30, 2025 13:09

node: update KEP 2625 to the latest template

94f19e8

cosmetic changes, no changes in content besides the bare minimum neededl to adjust to the new template. Signed-off-by: Francesco Romani <fromani@redhat.com>

node: 2625: updates for GA

5f27b1f

catch up with the updates since beta graduation Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani force-pushed the cpumanager-policy-options-to-ga branch 2 times, most recently from 5bde288 to eb7974c Compare January 30, 2025 12:16

k8s-ci-robot changed the title ~~KEP-2625: Update CPU Manager Policy Options 1.33 GA~~ KEP-2625: Promote CPUManager Policy Options to GA Jan 31, 2025

kannon92 reviewed Jan 31, 2025

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 31, 2025

ffromani force-pushed the cpumanager-policy-options-to-ga branch from eb7974c to a439e40 Compare February 3, 2025 08:04

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 3, 2025

ffromani mentioned this pull request Feb 3, 2025

node: metrics for alignment failures kubernetes/kubernetes#129950

Merged

ffromani commented Feb 3, 2025

View reviewed changes

Comment thread keps/sig-node/2625-cpumanager-policies-thread-placement/kep.yaml Outdated

swatisehgal reviewed Feb 4, 2025

View reviewed changes

ffromani changed the title ~~KEP-2625: Promote CPUManager Policy Options to GA~~ KEP-2625: Promote the CPUManager Policy Option full-pcpus-only to GA Feb 4, 2025

swatisehgal reviewed Feb 5, 2025

View reviewed changes

k8s-ci-robot assigned swatisehgal Feb 5, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 5, 2025

ffromani force-pushed the cpumanager-policy-options-to-ga branch from a439e40 to 507487e Compare February 5, 2025 17:28

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 5, 2025

swatisehgal reviewed Feb 6, 2025

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 6, 2025

mrunalp approved these changes Feb 7, 2025

View reviewed changes

johnbelamaric reviewed Feb 10, 2025

View reviewed changes

haircommander reviewed Feb 10, 2025

View reviewed changes

Comment thread keps/sig-node/2625-cpumanager-policies-thread-placement/README.md Outdated

node: 2625: add content for GA graduation

da6db82

File missing content Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani force-pushed the cpumanager-policy-options-to-ga branch from 507487e to da6db82 Compare February 11, 2025 09:17

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 11, 2025

ffromani mentioned this pull request Feb 11, 2025

KEP-4800: Split Uncore Cache Toplogy Awareness Alignment #5110

Merged

johnbelamaric approved these changes Feb 11, 2025

View reviewed changes

k8s-ci-robot assigned johnbelamaric Feb 11, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 11, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 11, 2025

k8s-ci-robot merged commit d2e54fa into kubernetes:master Feb 11, 2025

k8s-ci-robot added this to the v1.33 milestone Feb 11, 2025

ffromani deleted the cpumanager-policy-options-to-ga branch February 11, 2025 16:28

ffromani added a commit to ffromani/kubernetes that referenced this pull request Feb 25, 2025

node: metrics for alignment failures

224918a

Add metrics to report alignment allocation failures See: kubernetes/enhancements#5108 Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani added a commit to ffromani/kubernetes that referenced this pull request Mar 4, 2025

node: metrics for alignment failures

04129d1

Add metrics to report alignment allocation failures See: kubernetes/enhancements#5108 Signed-off-by: Francesco Romani <fromani@redhat.com>

zylxjtu pushed a commit to zylxjtu/kubernetes that referenced this pull request Mar 20, 2025

node: metrics for alignment failures

1c6b1c1

Add metrics to report alignment allocation failures See: kubernetes/enhancements#5108 Signed-off-by: Francesco Romani <fromani@redhat.com>

KobayashiD27 pushed a commit to KobayashiD27/kubernetes that referenced this pull request Mar 31, 2025

node: metrics for alignment failures

bdf5cb1

Add metrics to report alignment allocation failures See: kubernetes/enhancements#5108 Signed-off-by: Francesco Romani <fromani@redhat.com>

Tal-or pushed a commit to Tal-or/kubernetes that referenced this pull request Apr 1, 2025

node: metrics for alignment failures

70f4327

Add metrics to report alignment allocation failures See: kubernetes/enhancements#5108 Signed-off-by: Francesco Romani <fromani@redhat.com>

Heniland pushed a commit to Heniland/kubernetes that referenced this pull request May 18, 2025

node: metrics for alignment failures

0bb5cf7

Add metrics to report alignment allocation failures See: kubernetes/enhancements#5108 Signed-off-by: Francesco Romani <fromani@redhat.com>


		N/A.

		###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?


		###### Are there any missing metrics that would be useful to have to improve observability of this feature?

		We can detail the pinning errors total with a new metric like `cpu_manager_errors_count` or

Conversation

ffromani commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ffromani commented Jan 30, 2025

Uh oh!

kannon92 commented Jan 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swatisehgal Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ffromani Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swatisehgal Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swatisehgal left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ffromani commented Feb 5, 2025

Uh oh!

swatisehgal left a comment

Choose a reason for hiding this comment

Uh oh!

johnbelamaric left a comment

Choose a reason for hiding this comment

ffromani commented Jan 30, 2025 •

edited

Loading

swatisehgal Feb 4, 2025 •

edited

Loading

ffromani Feb 5, 2025 •

edited

Loading

swatisehgal Feb 4, 2025 •

edited

Loading