kubelet: cm: refactor QoS logic into seperate interface#41833
Conversation
|
Per our chat @vishh , I will take over review of this PR. |
derekwaynecarr
left a comment
There was a problem hiding this comment.
few minor nits, update then I will tag.
| // - Ensures that the Docker daemon is in a container. | ||
| // - Creates the system container where all non-containerized processes run. | ||
| Start(*v1.Node) error | ||
| Start(*v1.Node, func() []*v1.Pod) error |
There was a problem hiding this comment.
can you create a type in kubelet/cm that is
// ActivePodsFunc returns pods bound to the kubelet that are active (i.e. non-terminal state)
type ActivePodsFunc func() []*v1.Pod
so the signature is clear about the call expects?
| // GetQOSContainersInfo returns the names of top level QoS containers | ||
| GetQOSContainersInfo() QOSContainersInfo | ||
|
|
||
| // UpdateQOSCgroups returns the names of top level QoS containers |
There was a problem hiding this comment.
godoc is wrong, it performs housekeeping updates to ensure that the top level QoS containers have their desired state. also note that this operation is thread-safe.
| } | ||
|
|
||
| func (cm *containerManagerImpl) setupNode() error { | ||
| func (cm *containerManagerImpl) setupNode(getPods func() []*v1.Pod) error { |
There was a problem hiding this comment.
activePodsFn ActivePodsFunc as argument.
| } | ||
|
|
||
| func (cm *containerManagerImpl) Start(node *v1.Node) error { | ||
| func (cm *containerManagerImpl) Start(node *v1.Node, getPods func() []*v1.Pod) error { |
There was a problem hiding this comment.
same here, take an ActivePodsFunc, and make sure active name propagates.
| return m.qosContainersInfo | ||
| } | ||
|
|
||
| func (m *qosContainerManagerImpl) Start(nodeInfo *v1.Node, getPods func() []*v1.Pod) error { |
| qosContainersInfo QOSContainersInfo | ||
| subsystems *CgroupSubsystems | ||
| cgroupManager CgroupManager | ||
| getPods func() []*v1.Pod |
5a7ed57 to
eec03d0
Compare
|
@k8s-bot -- lets try now! |
|
@k8s-bot test this |
eec03d0 to
46c50b2
Compare
46c50b2 to
3ea9825
Compare
|
@k8s-bot test this |
|
@k8s-bot gce etcd3 e2e test this |
| m.Lock() | ||
| defer m.Unlock() | ||
|
|
||
| qosConfigs := map[v1.PodQOSClass]*CgroupConfig{ |
There was a problem hiding this comment.
do you mind if we add this in a follow-on and just have this return nil after the lock/defer lock call?
|
one minor request so the change is entirely structural, we can do the writes in our follow-on for memory/cpu but at least we can get the scaffolding in asap. |
3ea9825 to
9d32b50
Compare
| if err := pcm.EnsureExists(pod); err != nil { | ||
| return fmt.Errorf("failed to ensure that the pod: %v cgroups exist and are correctly applied: %v", pod.UID, err) | ||
| } | ||
| if err := kl.containerManager.UpdateQOSCgroups(); err != nil { |
There was a problem hiding this comment.
@sjenning -- this is the wrong order... this should be before EnsureExists, not after. sorry for just catching this.
There was a problem hiding this comment.
i also wonder if we should change the pcm.EnsureExists step for now to the following:
if !pcm.Exists {
updateQOSCgroups()
pcm.EnsureExists()
}
in this way we don't attempt any update when its not necessary.
| // been decreased earlier | ||
| reduceCpuLimits = false | ||
| } | ||
| if err := kl.containerManager.UpdateQOSCgroups(); err != nil { |
There was a problem hiding this comment.
this needs to be after KillPod(), not before.
357cf54 to
df1e2d6
Compare
df1e2d6 to
c23142a
Compare
|
rebased on top of #41234 that will hopefully merge soon |
There was a problem hiding this comment.
i know @vishh asked that this be called "Containers".
similar to the discussion on node allocatable, i am not going to hold up this pr on naming conventions.
we can fix that across the board in a later pr.
right now "Cgroups" is consistent with "PodContainerManager" terminology.
There was a problem hiding this comment.
i also want to go on the record that "container" is not defined in the kernel, but "cgroup" is. and container is an overloaded term in a project like this.
| // - Ensures that the Docker daemon is in a container. | ||
| // - Creates the system container where all non-containerized processes run. | ||
| Start(*v1.Node) error | ||
| Start(*v1.Node, ActivePodsFunc) error |
There was a problem hiding this comment.
i think we can move this to a struct in a follow-on. i do not want to hold this pr on that change.
|
any nits can be handled in a follow-on. i am tagging this /lgtm and marking do-not-merge to ensure that node allocatable pr lands first. |
ad951ff to
ee2dcce
Compare
|
Rebased on top of node alloc PR w/ systemd fixes and it works. Add to add new commit to work with new node allocatable code ee2dcce The changes mostly center around the |
ee2dcce to
0861dce
Compare
0861dce to
b9adb66
Compare
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED The following people have approved this PR: derekwaynecarr, k8s-merge-robot, sjenning Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
|
Automatic merge from submit-queue (batch tested with PRs 38676, 41765, 42103, 41833, 41702) |
Automatic merge from submit-queue (batch tested with PRs 41644, 42020, 41753, 42206, 42212) Burstable QoS cgroup has cpu shares assigned **What this PR does / why we need it**: This PR sets the Burstable QoS cgroup cpu shares value to the sum of the pods cpu requests in that tier. We need it for proper evaluation of CPU shares in the new QoS hierarchy. **Special notes for your reviewer**: It builds against the framework proposed for #41833
This commit has no functional change. It refactors the QoS cgroup logic into a new
QOSContainerManagerinterface to allow for better isolation for QoS cgroup features coming down the pike.This is a breakout of the refactoring component of my QoS memory limits PR #41149 which will need to be rebased on top of this.
@vishh @derekwaynecarr