kubelet: fix duplicated status updates at pod cleanup by Random-Liu · Pull Request #22155 · kubernetes/kubernetes

Random-Liu · 2016-02-28T21:57:51Z

4th try of #21438. Based on #22117.

I finally found the root reason of the last bug. I reproduced the issue #21959 (comment) for several times. I find that each time a pod failed to be deleted whose containers were exited and removed before next pleg relist. After that, pleg didn't report any event, thus pod cache was never updated.

After some debugging, I find that:

After each relist, the current pod record will be updated no matter the container state changed or not (here);
However, only pod record of which container state is changed will be updated (here).
For those pods whose containers are not changed, their old and current pod record will be left in the same state.
If now all containers of a pod are removed before relist, because there is no corresponding pod returned by GetPods, the current pod record will not be updated, it is still the same with old pod record.
Then generateEvent will not generate any event because the old and current states are equal (here).

In the 2nd commit of this PR. I changed the setCurrent function to let it update all current pod records at once. Before updating, it will clear all current pod records because we'll never use them any more.

This time I have run 40 nodes kubemark test overnight (about 10 hours) without any failure. (Without this fix, it usually failed after 20 or 30 minutes)

In summary:

The 1st commit is fetched from pleg: ensure the cache is updated whenever container are removed #22117, it ensures that all pods with container state changed whose pod records will be updated.
The 2nd commit ensures that pleg can detect container being removed.
The 3rd commit fix the bug in fake docker client which leads to error ListContainer result.
The 4th commit is the original PR kubelet: fix duplicated status updates at pod cleanup #21438.

This time, I think this PR will not fail kubemark anymore......:)
@yujuhong

Even though we don't rely on the cache for garbage collection yet, we should keep it up-to-date.

cleanupTerminatedPods is responsible for checking whether a pod has been terminated and force a status update to trigger the pod deletion. However, this function is called in the periodic clenup routine, which runs every 2 seconds. In other words, it forces a status update for each non-running (and not yet deleted in the apiserver) pod. When batch deleting tens of pods, the rate of new updates surpasses what the status manager can handle, causing numerous redundant requests (and the status channel to be full). This change forces a status update only when detecting the DeletionTimestamp is set for a terminated pod. Note that for other non-terminated pods, the pod workers should be responsible for setting the correct status after killling all the containers.

k8s-github-robot · 2016-02-28T22:02:42Z

Labelling this PR as size/L

k8s-bot · 2016-02-28T22:25:33Z

GCE e2e build/test passed for commit 866c52c.

yujuhong · 2016-02-29T16:40:20Z

Thanks a lot for debugging this!

k8s-github-robot · 2016-02-29T19:45:32Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-bot · 2016-02-29T20:15:34Z

GCE e2e build/test passed for commit 866c52c.

k8s-github-robot · 2016-02-29T21:14:28Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-bot · 2016-02-29T21:43:07Z

GCE e2e build/test passed for commit 866c52c.

k8s-github-robot · 2016-02-29T21:49:38Z

Automatic merge from submit-queue

Auto commit by PR queue bot

The kubelet must not allow a container that was reported failed in a restartPolicy=Never pod to be reported to the apiserver as success. If a client deletes a restartPolicy=Never pod, the dispatchWork and status manager race to update the container status. When dispatchWork (specifically podIsTerminated) returns true, it means all containers are stopped, which means status in the container is accurate. However, the TerminatePod method then clears this status. This results in a pod that has been reported with status.phase=Failed getting reset to status.phase.Succeeded, which is a violation of the guarantees around terminal phase. TerminatePod was introduced in kubernetes#22155 and handled an error where the previous naive status update implementation sent too many requests. The core requirement is to trigger a status update. We update the TerminatePod method to merge status rather than stomping it - all containers without a terminal state are given terminal state of exit code 137 (equiv to sigkill) with a unique reason and message. This ensures clients know that the status is synthetic, but naive clients don't interpret "failed to get any container status" as "container succeeded" which is the root of this bug.

yujuhong and others added 4 commits February 28, 2016 13:16

pleg: ensure the cache is updated whenever container are removed

3886892

Even though we don't rely on the cache for garbage collection yet, we should keep it up-to-date.

kubelet: clear current pod records before relist

96eeb81

dockertools: fix error exited container list in fake docker client

98283d8

Random-Liu added kind/bug Categorizes issue or PR as related to a bug. area/test priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Feb 28, 2016

Random-Liu assigned yujuhong Feb 28, 2016

googlebot added the cla: yes label Feb 28, 2016

k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 28, 2016

Random-Liu added this to the v1.2 milestone Feb 28, 2016

yujuhong added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 29, 2016

k8s-github-robot pushed a commit that referenced this pull request Feb 29, 2016

Merge pull request #22155 from Random-Liu/terminated-pods

ae2a8da

Auto commit by PR queue bot

k8s-github-robot merged commit ae2a8da into kubernetes:master Feb 29, 2016

Random-Liu deleted the terminated-pods branch February 29, 2016 21:53

yujuhong mentioned this pull request Mar 1, 2016

pleg: ensure the cache is updated whenever container are removed #22117

Closed

dashpole mentioned this pull request Feb 15, 2017

Fix bug in status manager TerminatePod #41436

Merged

yujuhong mentioned this pull request May 8, 2019

Kubelet should handle pod deletion better #19037

Closed

sjenning mentioned this pull request May 14, 2019

kubelet: invalid pod phase transition from Failed to Succeeded #58711

Closed

smarterclayton mentioned this pull request May 14, 2019

Prevent Kubelet from clearing terminated pod status #77870

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet: fix duplicated status updates at pod cleanup#22155

kubelet: fix duplicated status updates at pod cleanup#22155
k8s-github-robot merged 4 commits into
kubernetes:masterfrom
Random-Liu:terminated-pods

Random-Liu commented Feb 28, 2016

Uh oh!

k8s-github-robot commented Feb 28, 2016

Uh oh!

k8s-bot commented Feb 28, 2016

Uh oh!

yujuhong commented Feb 29, 2016

Uh oh!

k8s-github-robot commented Feb 29, 2016

Uh oh!

k8s-bot commented Feb 29, 2016

Uh oh!

k8s-github-robot commented Feb 29, 2016

Uh oh!

k8s-bot commented Feb 29, 2016

Uh oh!

k8s-github-robot commented Feb 29, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Random-Liu commented Feb 28, 2016

Uh oh!

k8s-github-robot commented Feb 28, 2016

Uh oh!

k8s-bot commented Feb 28, 2016

Uh oh!

yujuhong commented Feb 29, 2016

Uh oh!

k8s-github-robot commented Feb 29, 2016

Uh oh!

k8s-bot commented Feb 29, 2016

Uh oh!

k8s-github-robot commented Feb 29, 2016

Uh oh!

k8s-bot commented Feb 29, 2016

Uh oh!

k8s-github-robot commented Feb 29, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants