kubelet: fix duplicated status updates at pod cleanup#22155
Merged
Conversation
Even though we don't rely on the cache for garbage collection yet, we should keep it up-to-date.
cleanupTerminatedPods is responsible for checking whether a pod has been terminated and force a status update to trigger the pod deletion. However, this function is called in the periodic clenup routine, which runs every 2 seconds. In other words, it forces a status update for each non-running (and not yet deleted in the apiserver) pod. When batch deleting tens of pods, the rate of new updates surpasses what the status manager can handle, causing numerous redundant requests (and the status channel to be full). This change forces a status update only when detecting the DeletionTimestamp is set for a terminated pod. Note that for other non-terminated pods, the pod workers should be responsible for setting the correct status after killling all the containers.
|
Labelling this PR as size/L |
|
GCE e2e build/test passed for commit 866c52c. |
Contributor
|
Thanks a lot for debugging this! |
|
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
|
GCE e2e build/test passed for commit 866c52c. |
|
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
|
GCE e2e build/test passed for commit 866c52c. |
|
Automatic merge from submit-queue |
k8s-github-robot
pushed a commit
that referenced
this pull request
Feb 29, 2016
Auto commit by PR queue bot
1 task
smarterclayton
added a commit
to smarterclayton/kubernetes
that referenced
this pull request
May 14, 2019
The kubelet must not allow a container that was reported failed in a restartPolicy=Never pod to be reported to the apiserver as success. If a client deletes a restartPolicy=Never pod, the dispatchWork and status manager race to update the container status. When dispatchWork (specifically podIsTerminated) returns true, it means all containers are stopped, which means status in the container is accurate. However, the TerminatePod method then clears this status. This results in a pod that has been reported with status.phase=Failed getting reset to status.phase.Succeeded, which is a violation of the guarantees around terminal phase. TerminatePod was introduced in kubernetes#22155 and handled an error where the previous naive status update implementation sent too many requests. The core requirement is to trigger a status update. We update the TerminatePod method to merge status rather than stomping it - all containers without a terminal state are given terminal state of exit code 137 (equiv to sigkill) with a unique reason and message. This ensures clients know that the status is synthetic, but naive clients don't interpret "failed to get any container status" as "container succeeded" which is the root of this bug.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
4th try of #21438. Based on #22117.
I finally found the root reason of the last bug. I reproduced the issue #21959 (comment) for several times. I find that each time a pod failed to be deleted whose containers were
exitedandremovedbefore next plegrelist. After that, pleg didn't report any event, thus pod cache was never updated.After some debugging, I find that:
relist, the current pod record will be updated no matter the container state changed or not (here);oldandcurrentpod record will be left in the same state.relist, because there is no corresponding pod returned byGetPods, the current pod record will not be updated, it is still the same with old pod record.generateEventwill not generate any event because the old and current states are equal (here).In the 2nd commit of this PR. I changed the
setCurrentfunction to let it update all current pod records at once. Before updating, it will clear all current pod records because we'll never use them any more.This time I have run 40 nodes kubemark test overnight (about 10 hours) without any failure. (Without this fix, it usually failed after 20 or 30 minutes)
In summary:
plegcan detect container being removed.ListContainerresult.This time, I think this PR will not fail kubemark anymore......:)
@yujuhong