added --reserved-cpus kubelet command option#83592
added --reserved-cpus kubelet command option#83592k8s-ci-robot merged 1 commit intokubernetes:masterfrom jianzzha:opt-reserved-cpus
Conversation
|
Hi @jianzzha. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
|
I'm not opposed to this change in principle. However, I'd like to understand a bit better exactly how this will interact with You mention:
Is it enough to just "supercede" these values? Can someone more knowledgable about /cc @ConnorDoyle @vishh |
|
Hi @jianzzha ! Thanks for your thoughts here :) I defer to @derekwaynecarr and @dchen1107 here, but my gut says this is initiative would be best served by creating a Kubernetes Enhancement Proposal, especially given that it includes an api change. |
The reserved CPU count overwrite happens at early phase, at the command line option processing time, from server.go:655. This way the behavior of how kubelet reports cpu resource number to the scheduler doesn't change (the additional code doesn't need to do anything on that regard). The only thing changed is how reserved cpu is assigned. It works well in the test. The master sees the right cpu count deducted; the kubelet keep these cpus from assigned by the static policy. |
cmd/kubelet/app/options/options.go
Outdated
There was a problem hiding this comment.
Nit: "A comma-separated list of CPUs or CPU ranges in ASCII decimal..."
|
/assign |
|
@jianzzha - can you update the existing topology manager KEP here https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0035-20190130-topology-manager.md with this update and use case? Can we discuss in next week SIG node call as well? |
derekwaynecarr
left a comment
There was a problem hiding this comment.
I am fine with this approach. It enables users who desire explicit control to use it, and those that are comfortable with the default heuristic to work as normal. I personally see little value in splitting kube-reserved from system-reserved for cfs sharing in this deployment as those users are probably running with real-time and will instead just want all things tied to specific cores. If you add validation, I am fine with the change, and we can discuss in next week sig node to see if there is any disagreement.
cmd/kubelet/app/server.go
Outdated
There was a problem hiding this comment.
so kube-reserved and system-reserved has imagined a topology where kubelet itself was managed in a separate part of cgroup topology than the rest of the system.
For example, system.slice for system-reserved and kube.slice for kube-reserved. I am not aware of any deployment that actually configured kubelet as such versus running it traditionally under a common systemd unit via system.slice. The idea was the we could measure resource usage of kube component separately from normal system services and also configure them with different cfs shares.
In a deployment where you use isolcpu, it’s arguable you do not want to bother with setting shares on system.slice separate from kube.slice, and just pin those components to the reserved cpu list.
There was a problem hiding this comment.
Ack. Actual deployments seems just have the systemd unit via system.slice. I see your point, we need to have a further discuss in a design meeting
cmd/kubelet/app/server.go
Outdated
There was a problem hiding this comment.
Ok, so this explicitly prevents reserving cfs shares for system versus kubelet separately. I am personally ok with this because my experience is that no one understands the difference between kube and system reserved. In the deployment topology you target, would you desire separate cfs shares? In practice, I imagine if you did, you would just set it direct in kubelet unit. Let’s discuss in sig-node.
There was a problem hiding this comment.
So by overriding, this will ensure that you don’t reserve more than capacity with normal validation... the error message may be confusing as it will complain about system-reserved...
There was a problem hiding this comment.
Ack, will fix the error message.
There was a problem hiding this comment.
The field requires validation.
There was a problem hiding this comment.
good reminder. will add it in two place: validation.go and server.go. In validation phase check it is a valid cpu list string, also make sure user does not intent to use --kube-reserved-cgroup nor --system-reserved-cgroup with the specific cpulist; in server.go make sure the cpulist is supported by the machine. If none of these condition meet, always take the old behavior
|
/assign @ConnorDoyle @lmdaly |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: derekwaynecarr, jianzzha, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test pull-kubernetes-e2e-gce |
|
/test pull-kubernetes-e2e-gce-100-performance |
|
/test pull-kubernetes-kubemark-e2e-gce-big |
|
/test pull-kubernetes-e2e-gce |
1 similar comment
|
/test pull-kubernetes-e2e-gce |
|
/test pull-kubernetes-dependencies managing the queue around the go1.13 bump, see https://groups.google.com/forum/#!topic/kubernetes-dev/Yyka3G2ebXE |
|
/retest |
…mand option Upstream reference: kubernetes/kubernetes#83592
Upstream reference: kubernetes#83592 Origin-commit: a8376d794b207cfff9eb147d5b392b17378935be
Upstream reference: kubernetes/kubernetes#83592 Origin-commit: a8376d794b207cfff9eb147d5b392b17378935be Kubernetes-commit: 0ae051d7c8a8020458cce111e29920d1bd393a03
Upstream reference: kubernetes#83592 Origin-commit: 595e0c2a93153e770c2826e689e08ed812dd8699
Upstream reference: kubernetes/kubernetes#83592 Origin-commit: 595e0c2a93153e770c2826e689e08ed812dd8699 Kubernetes-commit: ad3b7577e2aed50f781e887f120ba31c80070a64
What type of PR is this?
/kind feature
What this PR does / why we need it:
Currently when --system-reserved or --kube-reserved is used, kubelet will reserve system resources and keep them from being used by user containers. What CPUs are reserved via these options are not easy to determine by the end users, even though the CPU selection algorithm make the selection in deterministic way.
For telco NFV use cases, when using static cpu manager policy, knowing explicitly what CPUs are reserved for system and what CPUs are used for user containers is very important. For example, if isolcpu are going to be defined on the kernel cmd line and used for NFV containers, they really should be used only by the NFV containers and not by the system threads. Therefore, providing an option to explicitly define the system reserved CPU list will make the planning job much easier and straightforward for such telco use cases.
This new cmd line option, --reserved-cpus, is to explicitly define the the cpu list that will be reserved for system. For example, if --reserved-cpus=0,1,2,3 is specified, then cpu 0,1,2,3 will be reserved for the system. On a system with 24 CPUs, the telco user may specify isolcpus=4-23 for the kernel option and use CPU 4-23 for the NFV containers.
When --reserved-cpus is specified together --system-reserved or --kube-reserved, the explicit CPU list specified by --reserved-cpus will supercede the CPUs requested via --system-reserved or --kube-reserved.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: