set KillMode for kubelet to process, fix for #13511 by onorua · Pull Request #23491 · kubernetes/kubernetes

onorua · 2016-03-25T14:35:45Z

Restart kubelet process, not the resource group, more details RHEL admin manual and #13511
New Ubuntu LTS 16.04 will have systemd by default, which may increase amount of complains and bug like we had. I propose to upstream the configuration.

k8s-bot · 2016-03-25T14:36:14Z

Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist")

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

k8s-bot · 2016-03-25T14:36:29Z

Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist")

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

k8s-bot · 2016-03-25T14:37:14Z

Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist")

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

k8s-github-robot · 2016-03-25T14:40:19Z

Labelling this PR as size/XS

mikedanese · 2016-03-28T22:29:26Z

cc @dchen1107 @vishh

rootfs · 2016-04-04T13:04:04Z

        --reconcile-cidr=false
        Restart=always
        RestartSec=10
+        KillMode=process


do we need this for master too?

I've added it for consistency.

The side effect is that we probably leak master spawned (if any) process when systemd restarts master.

hm... what is the difference between kubelet on master node and on compute node? I mean if there is a risk of leaking something, it is dangerous the same way on all nodes. If there is no risk - then it is okay on all nodes as well. At least this is my, maybe naive, understanding of kubelet functionality and kubernetes pods management.

Works for me. @vishh ?

In the master node, excepting the kubelet, all other components are expected to run in containers. So this change should be fine.

vishh · 2016-04-04T16:41:49Z

Kubelet execs other processes. Does it make sense to run kubelet in a separate cgroup?

rootfs · 2016-04-04T17:34:14Z

Running kubelet in different cgroup might not help, since systemd tracks the group kubelet is in and kill them all.

vishh · 2016-04-04T22:23:29Z

Its still not clear why cleaning up kubelet service's cgroup is an issue. Am I missing some detail from systemd perspective?

rootfs · 2016-04-04T23:54:48Z

@vishh systemd cleaning up kubelet and its control group has a side effect of killing glusterfs (or other FUSE) mount, as identified in the environment of @onorua, because the mount daemon is exec'ed by kubelet.

vishh · 2016-04-05T00:02:33Z

How is the lifecycle of the mount daemon managed by kubelet?

rootfs · 2016-04-05T00:22:44Z

When a pod requests a e.g. glusterfs, kubelet mounter will in the end invoke the glusterfs mount daemon. The daemon stays till the volume is unmounted. If systemd stops kubelet with killMode=control-group (the default), both kubelet and glusterfs daemon is killed, while the container stays alive with a broken bind mount. When systemd starts kubelet again, even though kubelet is able to re-mount the volume, the broken bind mount in the container cannot be repaired. This is what happened in #13511

The proposed fix is to tell systemd to just kill kubelet and leave other processes alive by setting killMode=process, the glusterfs daemon stays with the container when kubelet is stopped.

vishh · 2016-04-05T18:38:11Z

cc @kubernetes/sig-node

vishh · 2016-04-05T18:41:18Z

Ideally, the FUSE daemon should be run in the pod's scope and not kubelet's scope. However this might need some re-design of these volumes. cc @thockin

ncdc · 2016-04-05T18:41:35Z

cc @kubernetes/rh-cluster-infra @smarterclayton

rootfs · 2016-04-05T19:13:06Z

Longer term I would vote for moving FUSE daemon to Pod's scope as it also helps resource accounting.

Between now and then, the proposed fix gives us the correct mount behavior we need during kubelet restart.

derekwaynecarr · 2016-04-05T21:12:13Z

@eparis - are there additional unit files for kubelet that would need to be covered by this change?

@vishh @dchen1107 - this means we need to account for resource consumption of storage driver daemons as part of kube-reserved for now... I will make a point of noting that in the systemd node spec for 1.3.

derekwaynecarr · 2016-04-05T21:28:49Z

A good reason for us to finally implement pod level cgroups... do we end
up running a FUSE daemon per pod? How does it work in the current model if
I have multiple pods on a node that want to use FUSE? Do they share a
daemon or each get their own instance?

On Tue, Apr 5, 2016 at 3:13 PM, Huamin Chen notifications@github.com
wrote:

I would vote for moving FUSE daemon to Pod's scope as it also helps
resource accounting.

—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly or view it on GitHub
#23491 (comment)

vishh · 2016-04-05T21:47:47Z

Pod level cgroups might will have its own set of issues. But yes, that the direction we need to move towards. Fixing the underlying problem will require refactoring of the volumes plugins though. I'd prefer managing the FUSE daemon as a separate systemd service, if at all possible.

eparis · 2016-04-06T00:23:23Z

@derekwaynecarr over in https://github.com/kubernetes/contrib/blob/master/init/systemd/kubelet.service

/me is sad that we moved the system units out of the tree and then duplicated them....

rootfs · 2016-04-06T14:22:42Z

Back to this PR, moving daemons (FUSE or probable network plugins) to its own cgroup doesn't conflict with setting systemd killMode=process. For now, killing kubelet and leaving daemons alive keeps the Pod's mount. Once daemons live in their own cgroup, kubelet is the only process in the group and thus systemd KillMode=process essentially has the same effect as KillMode=control-group.

vishh · 2016-04-06T20:54:23Z

I guess this PR will work for now, without (hopefully) any process leaks. LGTM.

k8s-github-robot · 2016-04-06T20:59:57Z

@k8s-bot ok to test
@k8s-bot test this

pr builder appears to be missing, activating due to 'lgtm' label.

k8s-github-robot · 2016-04-06T20:59:58Z

Removing LGTM because the release note process has not been followed.
One of the following labels is required "release-note", "release-note-none", or "release-note-action-required"
Please see: https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/release-notes.md

k8s-bot · 2016-04-06T21:34:05Z

GCE e2e build/test passed for commit 0bfc496.

k8s-github-robot · 2016-04-08T21:48:15Z

@k8s-bot test this

Tests are more than 48 hours old. Re-running tests.

k8s-bot · 2016-04-08T22:29:14Z

GCE e2e build/test passed for commit 0bfc496.

wwwtyro · 2016-10-05T19:42:15Z

This appears to be leaking journalctl processes originating here: https://github.com/kubernetes/kubernetes/blob/master/vendor/github.com/google/cadvisor/utils/oomparser/oomparser.go#L169

vishh · 2016-10-17T18:14:17Z

@wwwtyro Thanks for reporting the regression. I filed #34965 to fix it.

wwwtyro · 2016-10-19T03:24:06Z

@vishh awesome, thank you

Fixed the regression caused by kubernetes#23491 which fix gcluster umount issue.

Bug 1732193: UPSTREAM: 80518: Fix detachment of deleted volumes Origin-commit: b793b93e81b28e3a30f4b3ad722267830767827d

fix for kubernetes#13511

0bfc496

googlebot added the cla: yes label Mar 25, 2016

k8s-github-robot assigned davidopp Mar 25, 2016

k8s-github-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 25, 2016

davidopp assigned mikedanese and unassigned davidopp Mar 26, 2016

k8s-github-robot added the release-note-label-needed label Mar 31, 2016

rootfs reviewed Apr 4, 2016
View reviewed changes

derekwaynecarr mentioned this pull request Apr 5, 2016

systemd node spec proposal #17688

Merged

mikedanese removed their assignment Apr 6, 2016

mikedanese assigned vishh Apr 6, 2016

vishh added lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-merge labels Apr 6, 2016

k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 6, 2016

vishh added release-note-none Denotes a PR that doesn't merit a release note. lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed release-note-label-needed labels Apr 6, 2016

j3ffml merged commit e17213a into kubernetes:master Apr 8, 2016

vishh mentioned this pull request Oct 17, 2016

cAdvisor leaking journalctl processes #34965

Closed

dchen1107 added a commit to dchen1107/kubernetes-1 that referenced this pull request Oct 25, 2016

Set KillMode for kubelet to control-group, instead of process.

0e52055

Fixed the regression caused by kubernetes#23491 which fix gcluster umount issue.

dchen1107 mentioned this pull request Oct 25, 2016

Set KillMode for kubelet to control-group, instead of process. #35479

Closed

Conversation

onorua commented Mar 25, 2016

Uh oh!

k8s-bot commented Mar 25, 2016

Uh oh!

k8s-bot commented Mar 25, 2016

Uh oh!

k8s-bot commented Mar 25, 2016

Uh oh!

k8s-github-robot commented Mar 25, 2016

Uh oh!

mikedanese commented Mar 28, 2016

Uh oh!

rootfs Apr 4, 2016

Choose a reason for hiding this comment

Uh oh!

onorua Apr 4, 2016

Choose a reason for hiding this comment

Uh oh!

rootfs Apr 4, 2016

Choose a reason for hiding this comment

Uh oh!

onorua Apr 4, 2016

Choose a reason for hiding this comment

Uh oh!

rootfs Apr 4, 2016

Choose a reason for hiding this comment

Uh oh!

vishh Apr 6, 2016

Choose a reason for hiding this comment

Uh oh!

vishh commented Apr 4, 2016

Uh oh!

rootfs commented Apr 4, 2016

Uh oh!

vishh commented Apr 4, 2016

Uh oh!

rootfs commented Apr 4, 2016

Uh oh!

vishh commented Apr 5, 2016

Uh oh!

rootfs commented Apr 5, 2016

Uh oh!

vishh commented Apr 5, 2016

Uh oh!

vishh commented Apr 5, 2016

Uh oh!

ncdc commented Apr 5, 2016

Uh oh!

rootfs commented Apr 5, 2016

Uh oh!

derekwaynecarr commented Apr 5, 2016

Uh oh!

derekwaynecarr commented Apr 5, 2016

Uh oh!

vishh commented Apr 5, 2016 via email

Uh oh!

eparis commented Apr 6, 2016

Uh oh!

rootfs commented Apr 6, 2016

Uh oh!

vishh commented Apr 6, 2016

Uh oh!

k8s-github-robot commented Apr 6, 2016

Uh oh!

k8s-github-robot commented Apr 6, 2016

Uh oh!

k8s-bot commented Apr 6, 2016

Uh oh!

k8s-github-robot commented Apr 8, 2016

Uh oh!

k8s-bot commented Apr 8, 2016

Uh oh!

wwwtyro commented Oct 5, 2016

Uh oh!

vishh commented Oct 17, 2016

Uh oh!

wwwtyro commented Oct 19, 2016

Uh oh!

Reviewers