Skip to content

Enable pod level cgroups by default#41349

Merged
k8s-github-robot merged 2 commits into
kubernetes:masterfrom
derekwaynecarr:enable-pod-cgroups
Feb 22, 2017
Merged

Enable pod level cgroups by default#41349
k8s-github-robot merged 2 commits into
kubernetes:masterfrom
derekwaynecarr:enable-pod-cgroups

Conversation

@derekwaynecarr

Copy link
Copy Markdown
Member

What this PR does / why we need it:
It enables pod level cgroups by default.

Special notes for your reviewer:
This is intended to be enabled by default on 2/14/2017 per the plan outlined here:
kubernetes/community#314

Release note:

Each pod has its own associated cgroup by default.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 13, 2017
@k8s-reviewable

Copy link
Copy Markdown

This change is Reviewable

@derekwaynecarr

Copy link
Copy Markdown
Member Author

fyi @vishh @sjenning

@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Feb 13, 2017
@vishh

vishh commented Feb 13, 2017

Copy link
Copy Markdown
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 13, 2017
@vishh

vishh commented Feb 13, 2017

Copy link
Copy Markdown
Contributor

/approve

@vishh vishh added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 13, 2017
@derekwaynecarr

Copy link
Copy Markdown
Member Author

@k8s-bot kops aws e2e test this

Comment thread hack/local-up-cluster.sh Outdated
# this is not defaulted to preserve backward compatibility.
# if EXPERIMENTAL_CGROUPS_PER_QOS is enabled, recommend setting to /
CGROUP_ROOT=${CGROUP_ROOT:-""}
#CGROUP_ROOT=${CGROUP_ROOT:-""}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not kill this? Or at least we should add a blank space after # as

# CGROUP_ROOT=${CGROUP_ROOT:-""}

@eparis

eparis commented Feb 14, 2017

Copy link
Copy Markdown
Contributor

/approve

@derekwaynecarr

Copy link
Copy Markdown
Member Author

@k8s-bot kops aws e2e test this

@vishh vishh added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Feb 15, 2017
@vishh

vishh commented Feb 15, 2017

Copy link
Copy Markdown
Contributor

@derekwaynecarr I added a do not merge label to prevent this PR from going in to the next beta release.

@dchen1107 wants this feature and node allocatable to be separated from CRI to make it easy to debug issues in beta releases.

@derekwaynecarr

Copy link
Copy Markdown
Member Author

@k8s-bot kops aws e2e test this

@vishh vishh removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Feb 17, 2017
@vishh

vishh commented Feb 17, 2017

Copy link
Copy Markdown
Contributor

A release has been made with CRI. This PR is ready to go.

@dchen1107

Copy link
Copy Markdown
Member

Yes, this pr is ready to go. Thanks!

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 18, 2017
@derekwaynecarr

Copy link
Copy Markdown
Member Author

so i just pushed a rebase, but i have no clue how the kops environment is configured.

looking at the kops logs for kubelet on earlier runs:

I see messages like the following:

W0217 23:16:39.530238    1431 container_manager_linux.go:728] CPUAccounting not enabled for pid: 995
W0217 23:16:39.530526    1431 container_manager_linux.go:731] MemoryAccounting not enabled for pid: 995
I0217 23:16:39.530804    1431 container_manager_linux.go:388] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
W0217 23:16:39.531152    1431 container_manager_linux.go:728] CPUAccounting not enabled for pid: 1431
W0217 23:16:39.531431    1431 container_manager_linux.go:731] MemoryAccounting not enabled for pid: 1431

This tells me that the image they run does not have cpu and memory accounting on by default. Someone on the kops team needs to fix the image to enable accounting globally via /etc/systemd/system.conf (DefaultCPUAccounting=true DefaultMemoryAccounting=true).

Further down the list of problems:

Failed cleaning pods: failed to get list of pods that still exist on cgroup mounts: failed to read the cgroup directory /sys/fs/cgroup/systemd/docker/BestEffort : open /sys/fs/cgroup/systemd/docker/BestEffort: no such file or directory

This confuses me, its possible cgroup-root is being configured incorrectly.

Are there any @kubernetes/kops-maintainers that can help fix their environments or debug this?

We are trying to roll out a new cgroup topology, and node allocatable support, and that requires some requisite changes to the configuration of the OS that may or may not be present.

@derekwaynecarr

Copy link
Copy Markdown
Member Author

Or can the @kubernetes/kops-maintainers set --cgroups-per-qos=false by default in their configuration while we stabilize the rollout of pod level cgroups / node allocatable?

/cc @vishh @dchen1107 @smarterclayton @chrislovecnm

@derekwaynecarr derekwaynecarr added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 18, 2017
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 18, 2017
@derekwaynecarr

Copy link
Copy Markdown
Member Author

Ok, it looks like kops has cgroup-root set to docker. That won't work, I can use this as a case study to error better, but we need kops to change I think...

@derekwaynecarr

Copy link
Copy Markdown
Member Author

@justinsb -- who is creating the /docker cgroup with this configuration?

@derekwaynecarr

Copy link
Copy Markdown
Member Author

If not running the systemd cgroup driver, I think it would be safe to ignore that entire mountpoint entirely. If latest test runs validate that theory, I will add that patch to this PR.

@justinsb

Copy link
Copy Markdown
Member

@justinsb -- who is creating the /docker cgroup with this configuration?

Docker was. Honestly I don't understand this, and nor did anyone else when I asked at the time. But I probably should have asked you :-)

So: What cgroups configuration should we be running in kubernetes (and therefore kops)?

@derekwaynecarr

Copy link
Copy Markdown
Member Author

@justinsb -- you should run with --cgroup-root=/ (which will be the default when this PR is enabled) and you should use the --cgroup-driver= to the value returned when you call docker info for its cgroup driver. it looks like on your install, it's using the cgroupfs driver, and so you taking the kube default for --cgroup-driver=cgroupfs makes sense.

i think for the bug exposed in this pr, it should be safe to ignore the systemd mountpoint entirely if you are not running the systemd cgroup driver. i will update, and see if the tests are happier.

@justinsb

Copy link
Copy Markdown
Member

OK that is super helpful @derekwaynecarr - I'll make the change in kops. Also, if we should be using a particular cgroup driver, we can probably force it at least on the images we build - let me know!

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 20, 2017
@derekwaynecarr

Copy link
Copy Markdown
Member Author

@justinsb -- ok, my last commit tested explicitly overriding --cgroup-root to / for kops, and as you can see kops e2e is happy on this pr now. if you can merge kubernetes/kops#1944 then I can remove the band-aid on this PR.

@derekwaynecarr

Copy link
Copy Markdown
Member Author

@k8s-bot gce etcd3 e2e test this

@derekwaynecarr

Copy link
Copy Markdown
Member Author

the kops pr to change cgroup-root to / has merged here:
kubernetes/kops#1955

I dropped my temporary patch and have kicked off the PR to hopefully see us passing.

@k8s-github-robot

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

The following people have approved this PR: derekwaynecarr, eparis, vishh

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@derekwaynecarr

Copy link
Copy Markdown
Member Author

yay, kops passed!

@justinsb

Copy link
Copy Markdown
Member

Hooray - thanks for helping us clean up cgroups in kops @derekwaynecarr :-)

@derekwaynecarr

Copy link
Copy Markdown
Member Author

@k8s-bot gci gke e2e test this
@k8s-bot cvm gke e2e test this
@k8s-bot unit test this

@derekwaynecarr

Copy link
Copy Markdown
Member Author

@k8s-bot cvm gce e2e test this
@k8s-bot gci gke e2e test this
@k8s-bot cvm gke e2e test this

@derekwaynecarr

Copy link
Copy Markdown
Member Author

@k8s-bot cvm gce e2e test this

@derekwaynecarr

Copy link
Copy Markdown
Member Author

all green, retagging to merge today.

@derekwaynecarr derekwaynecarr added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 22, 2017
@k8s-github-robot

Copy link
Copy Markdown

Automatic merge from submit-queue (batch tested with PRs 41349, 41532, 41256, 41587, 41657)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants