Skip to content

kubelet: fix duplicated status updates at pod cleanup#22155

Merged
k8s-github-robot merged 4 commits into
kubernetes:masterfrom
Random-Liu:terminated-pods
Feb 29, 2016
Merged

kubelet: fix duplicated status updates at pod cleanup#22155
k8s-github-robot merged 4 commits into
kubernetes:masterfrom
Random-Liu:terminated-pods

Conversation

@Random-Liu

Copy link
Copy Markdown
Member

4th try of #21438. Based on #22117.

I finally found the root reason of the last bug. I reproduced the issue #21959 (comment) for several times. I find that each time a pod failed to be deleted whose containers were exited and removed before next pleg relist. After that, pleg didn't report any event, thus pod cache was never updated.

After some debugging, I find that:

  • After each relist, the current pod record will be updated no matter the container state changed or not (here);
  • However, only pod record of which container state is changed will be updated (here).
  • For those pods whose containers are not changed, their old and current pod record will be left in the same state.
  • If now all containers of a pod are removed before relist, because there is no corresponding pod returned by GetPods, the current pod record will not be updated, it is still the same with old pod record.
  • Then generateEvent will not generate any event because the old and current states are equal (here).

In the 2nd commit of this PR. I changed the setCurrent function to let it update all current pod records at once. Before updating, it will clear all current pod records because we'll never use them any more.

This time I have run 40 nodes kubemark test overnight (about 10 hours) without any failure. (Without this fix, it usually failed after 20 or 30 minutes)

In summary:

This time, I think this PR will not fail kubemark anymore......:)
@yujuhong

yujuhong and others added 4 commits February 28, 2016 13:16
Even though we don't rely on the cache for garbage collection yet, we should
keep it up-to-date.
cleanupTerminatedPods is responsible for checking whether a pod has been
terminated and force a status update to trigger the pod deletion. However, this
function is called in the periodic clenup routine, which runs every 2 seconds.
In other words, it forces a status update for each non-running (and not yet
deleted in the apiserver) pod. When batch deleting tens of pods, the rate of
new updates surpasses what the status manager can handle, causing numerous
redundant requests (and the status channel to be full).

This change forces a status update only when detecting the DeletionTimestamp is
set for a terminated pod. Note that for other non-terminated pods, the pod
workers should be responsible for setting the correct status after killling all
the containers.
@Random-Liu Random-Liu added kind/bug Categorizes issue or PR as related to a bug. area/test priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Feb 28, 2016
@k8s-github-robot

Copy link
Copy Markdown

Labelling this PR as size/L

@k8s-github-robot k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 28, 2016
@Random-Liu Random-Liu added this to the v1.2 milestone Feb 28, 2016
@k8s-bot

k8s-bot commented Feb 28, 2016

Copy link
Copy Markdown

GCE e2e build/test passed for commit 866c52c.

@yujuhong yujuhong added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 29, 2016
@yujuhong

Copy link
Copy Markdown
Contributor

Thanks a lot for debugging this!

@k8s-github-robot

Copy link
Copy Markdown

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot

k8s-bot commented Feb 29, 2016

Copy link
Copy Markdown

GCE e2e build/test passed for commit 866c52c.

@k8s-github-robot

Copy link
Copy Markdown

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot

k8s-bot commented Feb 29, 2016

Copy link
Copy Markdown

GCE e2e build/test passed for commit 866c52c.

@k8s-github-robot

Copy link
Copy Markdown

Automatic merge from submit-queue

smarterclayton added a commit to smarterclayton/kubernetes that referenced this pull request May 14, 2019
The kubelet must not allow a container that was reported failed in a
restartPolicy=Never pod to be reported to the apiserver as success.
If a client deletes a restartPolicy=Never pod, the dispatchWork and
status manager race to update the container status. When dispatchWork
(specifically podIsTerminated) returns true, it means all containers
are stopped, which means status in the container is accurate. However,
the TerminatePod method then clears this status. This results in a
pod that has been reported with status.phase=Failed getting reset to
status.phase.Succeeded, which is a violation of the guarantees around
terminal phase.

TerminatePod was introduced in kubernetes#22155 and handled an error where
the previous naive status update implementation sent too many requests.
The core requirement is to trigger a status update. We update the
TerminatePod method to merge status rather than stomping it - all
containers without a terminal state are given terminal state of
exit code 137 (equiv to sigkill) with a unique reason and message.
This ensures clients know that the status is synthetic, but naive
clients don't interpret "failed to get any container status" as
"container succeeded" which is the root of this bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/kubelet area/test kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants