Disable cgroups-per-qos pending Burstable/cpu.shares being set#42052
Conversation
|
what was the symptom? are pods being starved? |
|
it's not obvious what the conclusion from #42000 was. |
|
Ah. are they being CPU starved? this reminds me of bugs in node allocatable level too |
|
@vishh -- my theory was they are starved since it will be 1024 shares, but it was just a theory. |
|
@k8s-bot kubemark e2e test this kubernetes/test-infra#2012 |
|
/LGTM |
|
[APPROVALNOTIFIER] This PR is APPROVED The following people have approved this PR: derekwaynecarr, vishh Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
|
I am bumping priority on this. |
|
@k8s-bot kubemark e2e test this kubernetes/test-infra#2012 |
|
LGTM - thanks! |
|
kubemark is hot looping, that needs to be fixed, this can merge in the interim, but we need to root cause why kubemark is hot-looping. |
|
Cross-posting here - it's the real kubelet that is hot looping too |
|
Can someone please help me to fill the gap here? Why cgroup-per-qos might cause issue for Pod startup latency regression? Shouldn't we agreed at sig-node meeting for 1.6 release, by default, all cgroup-per-qos should be unlimited? Each Kubernetes vendor decide the limit later based on the performance benchmark and other monitoring stats? Or we mistakenly set the limit for each top cgroup? |
|
see: #42000 (comment) we are not yet setting cpu shares on qos tier (which is required otherwise there is a regression under contention). |
|
@derekwaynecarr This is exactly why I am confused. I thought I raised this concern at sig-node meeting, and finally we agreed on the following regarding to NodeAllocatable & QoS tree rollout in 1.6 release: Step 1: Creating all top level QoS cgroup and per pod cgroup, but unlimit them (hence: set the limit to something equivalent to the node capacity / node allocatable) But based on #42000 (comment), it looks like we messed up with step 1. Instead of unlimit those top-level cgroup, we unset them. At least for burstable cpu cgroup, it has 1024 which looks like an unset value to me. EDITED: Forget this comment here. I realized there would be another set of issue. :-) |
|
The issue is that the default value for cpu shares is |
|
Automatic merge from submit-queue (batch tested with PRs 41714, 41510, 42052, 41918, 31515) |
|
@derekwaynecarr when re-enabling --cgroups-per-qos, also set --enforce-node-allocatable to |
Disable cgroups-per-qos to allow kubemark problems to still be resolved.
Re-enable it once the following merge:
#41753
#41644
#41621
Enabling it before cpu.shares is set on qos tiers can cause regressions since Burstable and BestEffort pods are given equal time.