Skip to content

added --reserved-cpus kubelet command option#83592

Merged
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
jianzzha:opt-reserved-cpus
Nov 7, 2019
Merged

added --reserved-cpus kubelet command option#83592
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
jianzzha:opt-reserved-cpus

Conversation

@jianzzha
Copy link
Copy Markdown
Contributor

@jianzzha jianzzha commented Oct 7, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test

/kind feature

/kind flake

What this PR does / why we need it:

Currently when --system-reserved or --kube-reserved is used, kubelet will reserve system resources and keep them from being used by user containers. What CPUs are reserved via these options are not easy to determine by the end users, even though the CPU selection algorithm make the selection in deterministic way.

For telco NFV use cases, when using static cpu manager policy, knowing explicitly what CPUs are reserved for system and what CPUs are used for user containers is very important. For example, if isolcpu are going to be defined on the kernel cmd line and used for NFV containers, they really should be used only by the NFV containers and not by the system threads. Therefore, providing an option to explicitly define the system reserved CPU list will make the planning job much easier and straightforward for such telco use cases.

This new cmd line option, --reserved-cpus, is to explicitly define the the cpu list that will be reserved for system. For example, if --reserved-cpus=0,1,2,3 is specified, then cpu 0,1,2,3 will be reserved for the system. On a system with 24 CPUs, the telco user may specify isolcpus=4-23 for the kernel option and use CPU 4-23 for the NFV containers.

When --reserved-cpus is specified together --system-reserved or --kube-reserved, the explicit CPU list specified by --reserved-cpus will supercede the CPUs requested via --system-reserved or --kube-reserved.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

A new kubelet command line option, --reserved-cpus, is introduced to explicitly define the the CPU list that will be reserved for system. For example, if --reserved-cpus=0,1,2,3 is specified, then cpu 0,1,2,3 will be reserved for the system.  On a system with 24 CPUs, the user may specify isolcpus=4-23 for the kernel option and use CPU 4-23 for the user containers.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 7, 2019
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @jianzzha. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added area/kubelet kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 7, 2019
@fejta-bot
Copy link
Copy Markdown

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@klueska
Copy link
Copy Markdown
Contributor

klueska commented Oct 7, 2019

I'm not opposed to this change in principle. However, I'd like to understand a bit better exactly how this will interact with --system-reserved and --kube-reserved.

You mention:

When --reserved-cpus is specified together --system-reserved or --kube-reserved, the explicit CPU list specified by --reserved-cpus will supercede the CPUs requested via --system-reserved or --kube-reserved.

Is it enough to just "supercede" these values? Can someone more knowledgable about --system-reserved and --kube-reserved chime in here?

/cc @ConnorDoyle @vishh

@mattjmcnaughton
Copy link
Copy Markdown
Contributor

Hi @jianzzha !

Thanks for your thoughts here :) I defer to @derekwaynecarr and @dchen1107 here, but my gut says this is initiative would be best served by creating a Kubernetes Enhancement Proposal, especially given that it includes an api change.
The link I included should have tips for starting a KEP.

@jianzzha
Copy link
Copy Markdown
Contributor Author

jianzzha commented Oct 8, 2019

I'm not opposed to this change in principle. However, I'd like to understand a bit better exactly how this will interact with --system-reserved and --kube-reserved.

You mention:

When --reserved-cpus is specified together --system-reserved or --kube-reserved, the explicit CPU list specified by --reserved-cpus will supercede the CPUs requested via --system-reserved or --kube-reserved.

Is it enough to just "supercede" these values? Can someone more knowledgable about --system-reserved and --kube-reserved chime in here?

/cc @ConnorDoyle @vishh

The reserved CPU count overwrite happens at early phase, at the command line option processing time, from server.go:655. This way the behavior of how kubelet reports cpu resource number to the scheduler doesn't change (the additional code doesn't need to do anything on that regard). The only thing changed is how reserved cpu is assigned.

It works well in the test. The master sees the right cpu count deducted; the kubelet keep these cpus from assigned by the static policy.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "A comma-separated list of CPUs or CPU ranges in ASCII decimal..."

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

@derekwaynecarr
Copy link
Copy Markdown
Member

/assign

@derekwaynecarr
Copy link
Copy Markdown
Member

@jianzzha - can you update the existing topology manager KEP here https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0035-20190130-topology-manager.md with this update and use case? Can we discuss in next week SIG node call as well?

Copy link
Copy Markdown
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this approach. It enables users who desire explicit control to use it, and those that are comfortable with the default heuristic to work as normal. I personally see little value in splitting kube-reserved from system-reserved for cfs sharing in this deployment as those users are probably running with real-time and will instead just want all things tied to specific cores. If you add validation, I am fine with the change, and we can discuss in next week sig node to see if there is any disagreement.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so kube-reserved and system-reserved has imagined a topology where kubelet itself was managed in a separate part of cgroup topology than the rest of the system.

For example, system.slice for system-reserved and kube.slice for kube-reserved. I am not aware of any deployment that actually configured kubelet as such versus running it traditionally under a common systemd unit via system.slice. The idea was the we could measure resource usage of kube component separately from normal system services and also configure them with different cfs shares.

In a deployment where you use isolcpu, it’s arguable you do not want to bother with setting shares on system.slice separate from kube.slice, and just pin those components to the reserved cpu list.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Actual deployments seems just have the systemd unit via system.slice. I see your point, we need to have a further discuss in a design meeting

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so this explicitly prevents reserving cfs shares for system versus kubelet separately. I am personally ok with this because my experience is that no one understands the difference between kube and system reserved. In the deployment topology you target, would you desire separate cfs shares? In practice, I imagine if you did, you would just set it direct in kubelet unit. Let’s discuss in sig-node.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So by overriding, this will ensure that you don’t reserve more than capacity with normal validation... the error message may be confusing as it will complain about system-reserved...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, will fix the error message.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field requires validation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good reminder. will add it in two place: validation.go and server.go. In validation phase check it is a valid cpu list string, also make sure user does not intent to use --kube-reserved-cgroup nor --system-reserved-cgroup with the specific cpulist; in server.go make sure the cpulist is supported by the machine. If none of these condition meet, always take the old behavior

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 12, 2019
@jianzzha
Copy link
Copy Markdown
Contributor Author

/assign @ConnorDoyle @lmdaly

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 6, 2019
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, jianzzha, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 6, 2019
@jianzzha
Copy link
Copy Markdown
Contributor Author

jianzzha commented Nov 6, 2019

/test pull-kubernetes-e2e-gce

@jianzzha
Copy link
Copy Markdown
Contributor Author

jianzzha commented Nov 6, 2019

/test pull-kubernetes-e2e-gce-100-performance

@jianzzha
Copy link
Copy Markdown
Contributor Author

jianzzha commented Nov 6, 2019

/test pull-kubernetes-kubemark-e2e-gce-big

@jianzzha
Copy link
Copy Markdown
Contributor Author

jianzzha commented Nov 6, 2019

/test pull-kubernetes-e2e-gce

1 similar comment
@jianzzha
Copy link
Copy Markdown
Contributor Author

jianzzha commented Nov 6, 2019

/test pull-kubernetes-e2e-gce

@liggitt
Copy link
Copy Markdown
Member

liggitt commented Nov 6, 2019

/test pull-kubernetes-dependencies

managing the queue around the go1.13 bump, see https://groups.google.com/forum/#!topic/kubernetes-dev/Yyka3G2ebXE

@liggitt
Copy link
Copy Markdown
Member

liggitt commented Nov 7, 2019

/retest

@k8s-ci-robot k8s-ci-robot merged commit 73b2c82 into kubernetes:master Nov 7, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Nov 7, 2019
vladikr added a commit to vladikr/origin that referenced this pull request Nov 26, 2019
vladikr added a commit to vladikr/origin that referenced this pull request Nov 26, 2019
vladikr added a commit to vladikr/origin that referenced this pull request Nov 28, 2019
vladikr added a commit to vladikr/origin that referenced this pull request Dec 4, 2019
vladikr added a commit to vladikr/origin that referenced this pull request Dec 4, 2019
openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this pull request Dec 9, 2019
Upstream reference: kubernetes#83592

Origin-commit: a8376d794b207cfff9eb147d5b392b17378935be
openshift-publish-robot pushed a commit to openshift/kubernetes-kubelet that referenced this pull request Dec 9, 2019
Upstream reference: kubernetes/kubernetes#83592

Origin-commit: a8376d794b207cfff9eb147d5b392b17378935be

Kubernetes-commit: 0ae051d7c8a8020458cce111e29920d1bd393a03
openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this pull request Dec 12, 2019
Upstream reference: kubernetes#83592

Origin-commit: 595e0c2a93153e770c2826e689e08ed812dd8699
openshift-publish-robot pushed a commit to openshift/kubernetes-kubelet that referenced this pull request Dec 12, 2019
Upstream reference: kubernetes/kubernetes#83592

Origin-commit: 595e0c2a93153e770c2826e689e08ed812dd8699

Kubernetes-commit: ad3b7577e2aed50f781e887f120ba31c80070a64
p0lyn0mial pushed a commit to p0lyn0mial/kubernetes that referenced this pull request Dec 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/code-generation area/kubeadm area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.