Skip to content

Disable cgroups-per-qos pending Burstable/cpu.shares being set#42052

Merged
k8s-github-robot merged 1 commit into
kubernetes:masterfrom
derekwaynecarr:disable-groups-per-qos
Feb 25, 2017
Merged

Disable cgroups-per-qos pending Burstable/cpu.shares being set#42052
k8s-github-robot merged 1 commit into
kubernetes:masterfrom
derekwaynecarr:disable-groups-per-qos

Conversation

@derekwaynecarr

@derekwaynecarr derekwaynecarr commented Feb 24, 2017

Copy link
Copy Markdown
Member

Disable cgroups-per-qos to allow kubemark problems to still be resolved.

Re-enable it once the following merge:
#41753
#41644
#41621

Enabling it before cpu.shares is set on qos tiers can cause regressions since Burstable and BestEffort pods are given equal time.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 24, 2017
@k8s-reviewable

Copy link
Copy Markdown

This change is Reviewable

@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-label-needed labels Feb 24, 2017
@derekwaynecarr

Copy link
Copy Markdown
Member Author

@derekwaynecarr derekwaynecarr added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Feb 24, 2017
@derekwaynecarr derekwaynecarr assigned vishh and wojtek-t and unassigned jessfraz Feb 24, 2017
@derekwaynecarr derekwaynecarr added this to the v1.6 milestone Feb 24, 2017
@derekwaynecarr derekwaynecarr changed the title Disble cgroups-per-qos pending Burstable/cpu.shares being set Disable cgroups-per-qos pending Burstable/cpu.shares being set Feb 24, 2017
@vishh

vishh commented Feb 24, 2017

Copy link
Copy Markdown
Contributor

what was the symptom? are pods being starved?

@derekwaynecarr

Copy link
Copy Markdown
Member Author

@vishh -- that was one theory as pods in question were in burstable tier. @sjenning -- is running kubemark runs in the interim so we can gather more data, but this should unblock folks.

@vishh

vishh commented Feb 24, 2017

Copy link
Copy Markdown
Contributor

it's not obvious what the conclusion from #42000 was.

@ncdc

ncdc commented Feb 24, 2017

Copy link
Copy Markdown
Member

@k8s-bot non-cri e2e test this #41893 #39821

@vishh

vishh commented Feb 24, 2017

Copy link
Copy Markdown
Contributor

Ah. are they being CPU starved? this reminds me of bugs in node allocatable level too

@derekwaynecarr

Copy link
Copy Markdown
Member Author

@vishh -- my theory was they are starved since it will be 1024 shares, but it was just a theory.

@ncdc

ncdc commented Feb 24, 2017

Copy link
Copy Markdown
Member

@k8s-bot kubemark e2e test this kubernetes/test-infra#2012

@vishh

vishh commented Feb 24, 2017

Copy link
Copy Markdown
Contributor

/LGTM
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 24, 2017
@k8s-github-robot

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

The following people have approved this PR: derekwaynecarr, vishh

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 24, 2017
@derekwaynecarr

Copy link
Copy Markdown
Member Author

I am bumping priority on this.

@fejta

fejta commented Feb 24, 2017

Copy link
Copy Markdown
Contributor

@k8s-bot kubemark e2e test this kubernetes/test-infra#2012

@wojtek-t

Copy link
Copy Markdown
Member

LGTM - thanks!

@derekwaynecarr

Copy link
Copy Markdown
Member Author

kubemark is hot looping, that needs to be fixed, this can merge in the interim, but we need to root cause why kubemark is hot-looping.

#42000 (comment)

@ncdc

ncdc commented Feb 24, 2017

Copy link
Copy Markdown
Member

Cross-posting here - it's the real kubelet that is hot looping too

@dchen1107

dchen1107 commented Feb 24, 2017

Copy link
Copy Markdown
Member

Can someone please help me to fill the gap here? Why cgroup-per-qos might cause issue for Pod startup latency regression? Shouldn't we agreed at sig-node meeting for 1.6 release, by default, all cgroup-per-qos should be unlimited? Each Kubernetes vendor decide the limit later based on the performance benchmark and other monitoring stats?

Or we mistakenly set the limit for each top cgroup?

@derekwaynecarr

Copy link
Copy Markdown
Member Author

see: #42000 (comment)

we are not yet setting cpu shares on qos tier (which is required otherwise there is a regression under contention).

@dchen1107

dchen1107 commented Feb 25, 2017

Copy link
Copy Markdown
Member

@derekwaynecarr This is exactly why I am confused. I thought I raised this concern at sig-node meeting, and finally we agreed on the following regarding to NodeAllocatable & QoS tree rollout in 1.6 release:

Step 1: Creating all top level QoS cgroup and per pod cgroup, but unlimit them (hence: set the limit to something equivalent to the node capacity / node allocatable)
Step 2: Introduce another flag for enforcement based on QoS design, but disable it by default.

But based on #42000 (comment), it looks like we messed up with step 1. Instead of unlimit those top-level cgroup, we unset them. At least for burstable cpu cgroup, it has 1024 which looks like an unset value to me.

EDITED: Forget this comment here. I realized there would be another set of issue. :-)

@vishh

vishh commented Feb 25, 2017

Copy link
Copy Markdown
Contributor

@dchen1107

The issue is that the default value for cpu shares is 1024. Even if we set the top level cgroup to node capacity, all its children QoS level and pod level cgroups will get 1024 as default cpu shares.
This kernel behavior leads to regression in CPU isolation.

@k8s-github-robot

Copy link
Copy Markdown

Automatic merge from submit-queue (batch tested with PRs 41714, 41510, 42052, 41918, 31515)

@k8s-github-robot k8s-github-robot merged commit a93904e into kubernetes:master Feb 25, 2017
@vishh

vishh commented Feb 26, 2017

Copy link
Copy Markdown
Contributor

@derekwaynecarr when re-enabling --cgroups-per-qos, also set --enforce-node-allocatable to pods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants