Skip to content

kubelet: fix duplicated status updates at pod cleanup#21438

Merged
k8s-github-robot merged 1 commit into
kubernetes:masterfrom
yujuhong:terminated_pods
Feb 22, 2016
Merged

kubelet: fix duplicated status updates at pod cleanup#21438
k8s-github-robot merged 1 commit into
kubernetes:masterfrom
yujuhong:terminated_pods

Conversation

@yujuhong

Copy link
Copy Markdown
Contributor

cleanupTerminatedPods is responsible for checking whether a pod has been
terminated and force a status update to trigger the pod deletion. However, this
function is called in the periodic clenup routine, which runs every 2 seconds.
In other words, it forces a status update for each non-running (and not yet
deleted in the apiserver) pod. When batch deleting tens of pods, the rate of
new updates surpasses what the status manager can handle, causing numerous
redundant requests (and the status channel to be full).

This change forces a status update only when detecting the DeletionTimestamp is
set for a terminated pod. Note that for other non-terminated pods, the pod
workers should be responsible for setting the correct status after killling all
the containers.

@yujuhong yujuhong added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Feb 18, 2016
@yujuhong

Copy link
Copy Markdown
Contributor Author

/cc @smarterclayton

@k8s-github-robot

Copy link
Copy Markdown

Labelling this PR as size/M

@k8s-github-robot k8s-github-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 18, 2016
@yujuhong

Copy link
Copy Markdown
Contributor Author

FYI, when deleting 100 pods on a node, the status update channel was completely flooded.

Comment thread pkg/kubelet/status/manager.go Outdated

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the cached status is not found, oldStatus will be nil here, which I think will lead to a nil pointer dereference in copyStatus?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I meant to set that to the pod.Status, but lost it during rebasing. Will fix it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@k8s-bot

k8s-bot commented Feb 18, 2016

Copy link
Copy Markdown

GCE e2e test build/test passed for commit d531fce733d2416870192e03ac685b872097a7d5.

@timstclair

Copy link
Copy Markdown

LGTM once the nil pointer is fixed.

@yujuhong

Copy link
Copy Markdown
Contributor Author

LGTM once the nil pointer is fixed.

Thanks for the review! It's fixed.

@timstclair timstclair added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 18, 2016
@k8s-bot

k8s-bot commented Feb 18, 2016

Copy link
Copy Markdown

GCE e2e test build/test passed for commit 58087f835b69b6427415ea5da6a185c1e36d1923.

@k8s-github-robot

Copy link
Copy Markdown

PR needs rebase

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 18, 2016
cleanupTerminatedPods is responsible for checking whether a pod has been
terminated and force a status update to trigger the pod deletion. However, this
function is called in the periodic clenup routine, which runs every 2 seconds.
In other words, it forces a status update for each non-running (and not yet
deleted in the apiserver) pod. When batch deleting tens of pods, the rate of
new updates surpasses what the status manager can handle, causing numerous
redundant requests (and the status channel to be full).

This change forces a status update only when detecting the DeletionTimestamp is
set for a terminated pod. Note that for other non-terminated pods, the pod
workers should be responsible for setting the correct status after killling all
the containers.
@k8s-github-robot

Copy link
Copy Markdown

PR changed after LGTM, removing LGTM.

@k8s-github-robot k8s-github-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 18, 2016
@yujuhong

Copy link
Copy Markdown
Contributor Author

Trivial rebase (only one line of comments was touched). Re-applying lgtm.

@yujuhong yujuhong added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 18, 2016
@k8s-bot

k8s-bot commented Feb 18, 2016

Copy link
Copy Markdown

GCE e2e build/test failed for commit 386453a.

@yujuhong

Copy link
Copy Markdown
Contributor Author

@k8s-bot test this: issue #21467

@k8s-bot

k8s-bot commented Feb 18, 2016

Copy link
Copy Markdown

GCE e2e test build/test passed for commit 386453a.

@k8s-github-robot

Copy link
Copy Markdown

@k8s-bot test this

Tests are more than 48 hours old. Re-running tests.

@k8s-bot

k8s-bot commented Feb 20, 2016

Copy link
Copy Markdown

GCE e2e test build/test passed for commit 386453a.

@k8s-github-robot

Copy link
Copy Markdown

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot

k8s-bot commented Feb 22, 2016

Copy link
Copy Markdown

GCE e2e test build/test passed for commit 386453a.

@k8s-github-robot

Copy link
Copy Markdown

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot

k8s-bot commented Feb 22, 2016

Copy link
Copy Markdown

GCE e2e test build/test passed for commit 386453a.

@k8s-github-robot

Copy link
Copy Markdown

Automatic merge from submit-queue

k8s-github-robot pushed a commit that referenced this pull request Feb 22, 2016
@k8s-github-robot k8s-github-robot merged commit 6716344 into kubernetes:master Feb 22, 2016
@wojtek-t

Copy link
Copy Markdown
Member

It seems that this PR somehow broke kubemark - the last two runs of kubemark are failing with "time out on deleting replication controllers", which seems clearly related to this PR ...
I'm going to revert this PR.

@wojtek-t

Copy link
Copy Markdown
Member

@yujuhong - just to give you an evidence that, these are the logs from the failure:

04:34:59 Feb 22 04:34:59.981: INFO: Deleting RC density3000-d7aa0532-d95f-11e5-9238-42010af01555 took: 2m32.169705578s
04:37:02 Feb 22 04:37:02.293: INFO: Terminating RC density3000-d7aa0532-d95f-11e5-9238-42010af01555 pods took: 2m2.312479847s
04:37:02 Feb 22 04:37:02.293: INFO: Unexpected error occurred: timed out waiting for the condition

Note that the phase "Terminating RC ..." before you change took ~2s, now it's timing out on 2 minutes.
I didn't have time to debug it deeply, but it clearly seems to be related to your change.

BTW - this is happening only on kubemark, something strange is now happening in SimpleKubelet? @gmarek can potentially give you more details on what kubemark is using

@wojtek-t

Copy link
Copy Markdown
Member

And to confirm - reverting this PR fixed the problem.

@yujuhong

Copy link
Copy Markdown
Contributor Author

BTW - this is happening only on kubemark, something strange is now happening in SimpleKubelet? @gmarek can potentially give you more details on what kubemark is using

I suspect the (non-)graceful termination in kubemark doesn't work expected because this PR only removes redundant pod deletion requests (which is a huge burden when batch deleting 100 pods) @Random-Liu has agreed to take a look and see what difference in kubemark and kubelet that'd cause pods not being terminated.

@wojtek-t

Copy link
Copy Markdown
Member

@yujuhong - we are running exactly the same test (the same number of nodes, the same number of pods, etc.) in both: kubemark and real cluster.
Only the former was failing, so it doesn't seem to be related to any batching or sth like. I really think it is something like difference between kubelet and simple kubelet or sth like that.

@yujuhong

Copy link
Copy Markdown
Contributor Author

@yujuhong - we are running exactly the same test (the same number of nodes, the same number of pods, etc.) in both: kubemark and real cluster.

It has nothing to do with batching as you said :) I think the SimpleKubelet may be too simple and not doing things right -- in which case the duplicated, redundant deletion requests in the periodic cleanup routine may have just saved the day.

@wojtek-t

Copy link
Copy Markdown
Member

Yeah - that's definitely possible.

@yujuhong

Copy link
Copy Markdown
Contributor Author

Alright, I believe I found the cause. Graceful termination does not work correctly in kubemark because of the fake docker client. After "stopping" a container, kubelet was unable to detect that the container has been stopped and thus did not consider the pod was fully terminated. We should fix the fake docker client.

E.g., repeatedly killing the same container:

I0222 23:16:36.861465       7 manager.go:1307] Killing container "/k8s_density60-246e5adf-d9ba-11e5-b19c-c4346b658736.a45f144c_density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc_e2e-tests-density-c5hwg_24fef989-d9ba-11e5-bfa6-42010af00007_52741857 density60-246e5adf-d9ba-11e5-b19c-c4346b658736 e2e-tests-density-c5hwg/density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc" with 30 second grace period
I0222 23:16:37.102736       7 manager.go:1339] Container "/k8s_density60-246e5adf-d9ba-11e5-b19c-c4346b658736.a45f144c_density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc_e2e-tests-density-c5hwg_24fef989-d9ba-11e5-bfa6-42010af00007_52741857 density60-246e5adf-d9ba-11e5-b19c-c4346b658736 e2e-tests-density-c5hwg/density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc" exited after 241.188219ms
W0222 23:16:37.102901       7 manager.go:1345] No ref for pod '"/k8s_density60-246e5adf-d9ba-11e5-b19c-c4346b658736.a45f144c_density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc_e2e-tests-density-c5hwg_24fef989-d9ba-11e5-bfa6-42010af00007_52741857 density60-246e5adf-d9ba-11e5-b19c-c4346b658736 e2e-tests-density-c5hwg/density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc"'
I0222 23:16:52.861760       7 manager.go:270] Ignoring same status for pod "density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc_e2e-tests-density-c5hwg(24fef989-d9ba-11e5-bfa6-42010af00007)", status: {Phase:Running Conditions:[{Type:Ready Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2016-02-22 23:15:37 +0000 UTC Reason: Message:}] Message: Reason: HostIP:10.245.1.5 PodIP:2.3.4.5 StartTime:2016-02-22 23:15:21 +0000 UTC ContainerStatuses:[{Name:density60-246e5adf-d9ba-11e5-b19c-c4346b658736 State:{Waiting:<nil> Running:0xc2089477c0 Terminated:<nil>} LastTerminationState:{Waiting:<nil> Running:<nil> Terminated:<nil>} Ready:true RestartCount:0 Image:gcr.io/google_containers/pause:2.0 ImageID:docker:// ContainerID:docker:///k8s_density60-246e5adf-d9ba-11e5-b19c-c4346b658736.a45f144c_density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc_e2e-tests-density-c5hwg_24fef989-d9ba-11e5-bfa6-42010af00007_52741857}]}
I0222 23:16:52.861940       7 manager.go:1307] Killing container "/k8s_density60-246e5adf-d9ba-11e5-b19c-c4346b658736.a45f144c_density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc_e2e-tests-density-c5hwg_24fef989-d9ba-11e5-bfa6-42010af00007_52741857 density60-246e5adf-d9ba-11e5-b19c-c4346b658736 e2e-tests-density-c5hwg/density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc" with 30 second grace period
I0222 23:16:53.192525       7 manager.go:1339] Container "/k8s_density60-246e5adf-d9ba-11e5-b19c-c4346b658736.a45f144c_density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc_e2e-tests-density-c5hwg_24fef989-d9ba-11e5-bfa6-42010af00007_52741857 density60-246e5adf-d9ba-11e5-b19c-c4346b658736 e2e-tests-density-c5hwg/density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc" exited after 330.523712ms
W0222 23:16:53.192595       7 manager.go:1345] No ref for pod '"/k8s_density60-246e5adf-d9ba-11e5-b19c-c4346b658736.a45f144c_density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc_e2e-tests-density-c5hwg_24fef989-d9ba-11e5-bfa6-42010af00007_52741857 density60-246e5adf-d9ba-11e5-b19c-c4346b658736 e2e-tests-density-c5hwg/density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc"'
I0222 23:17:06.861037       7 manager.go:270] Ignoring same status for pod "density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc_e2e-tests-density-c5hwg(24fef989-d9ba-11e5-bfa6-42010af00007)", status: {Phase:Running Conditions:[{Type:Ready Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2016-02-22 23:15:37 +0000 UTC Reason: Message:}] Message: Reason: HostIP:10.245.1.5 PodIP:2.3.4.5 StartTime:2016-02-22 23:15:21 +0000 UTC ContainerStatuses:[{Name:density60-246e5adf-d9ba-11e5-b19c-c4346b658736 State:{Waiting:<nil> Running:0xc2082c7780 Terminated:<nil>} LastTerminationState:{Waiting:<nil> Running:<nil> Terminated:<nil>} Ready:true RestartCount:0 Image:gcr.io/google_containers/pause:2.0 ImageID:docker:// ContainerID:docker:///k8s_density60-246e5adf-d9ba-11e5-b19c-c4346b658736.a45f144c_density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc_e2e-tests-density-c5hwg_24fef989-d9ba-11e5-bfa6-42010af00007_52741857}]}
I0222 23:17:06.861188       7 manager.go:1307] Killing container "/k8s_density60-246e5adf-d9ba-11e5-b19c-c4346b658736.a45f144c_density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc_e2e-tests-density-c5hwg_24fef989-d9ba-11e5-bfa6-42010af00007_52741857 density60-246e5adf-d9ba-11e5-b19c-c4346b658736 e2e-tests-density-c5hwg/density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc" with 30 second grace period
I0222 23:17:07.089368       7 manager.go:1339] Container "/k8s_density60-246e5adf-d9ba-11e5-b19c-c4346b658736.a45f144c_density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc_e2e-tests-density-c5hwg_24fef989-d9ba-11e5-bfa6-42010af00007_52741857 density60-246e5adf-d9ba-11e5-b19c-c4346b658736 e2e-tests-density-c5hwg/density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc" exited after 228.148632ms
W0222 23:17:07.089424       7 manager.go:1345] No ref for pod '"/k8s_density60-246e5adf-d9ba-11e5-b19c-c4346b658736.a45f144c_density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc_e2e-tests-density-c5hwg_24fef989-d9ba-11e5-bfa6-42010af00007_52741857 density60-246e5adf-d9ba-11e5-b19c-c4346b658736 e2e-tests-density-c5hwg/density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc"'
I0222 23:17:20.861106       7 manager.go:270] Ignoring same status for pod "density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc_e2e-tests-density-c5hwg(24fef989-d9ba-11e5-bfa6-42010af00007)", status: {Phase:Running Conditions:[{Type:Ready Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2016-02-22 23:15:37 +0000 UTC Reason: Message:}] Message: Reason: HostIP:10.245.1.5 PodIP:2.3.4.5 StartTime:2016-02-22 23:15:21 +0000 UTC ContainerStatuses:[{Name:density60-246e5adf-d9ba-11e5-b19c-c4346b658736 State:{Waiting:<nil> Running:0xc208bdc780 Terminated:<nil>} LastTerminationState:{Waiting:<nil> Running:<nil> Terminated:<nil>} Ready:true RestartCount:0 Image:gcr.io/google_containers/pause:2.0 ImageID:docker:// ContainerID:docker:///k8s_density60-246e5adf-d9ba-11e5-b19c-c4346b658736.a45f144c_density60-246e5adf-d9ba-11e5-b19c-c4346b658736-6hzbc_e2e-tests-density-c5hwg_24fef989-d9ba-11e5-bfa6-42010af00007_52741857}]}

@Random-Liu

Copy link
Copy Markdown
Member

@yujuhong Thanks!
I found the root cause. In the function updateContainerStatus(), only statuses in ContainerList were updated. But if we stop a container, it will be moved to ExitedContainerList first, so its status won't be updated to "Exited".

@gmarek

gmarek commented Feb 23, 2016

Copy link
Copy Markdown
Contributor

cc @fejta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants