Skip to content

Automated cherry pick of #110256: kubelet: Mark ready condition as false explicitly for terminal pods#110481

Merged
k8s-ci-robot merged 2 commits into
kubernetes:release-1.22from
bobbypage:automated-cherry-pick-of-#110256-upstream-release-1.22
Jun 9, 2022
Merged

Automated cherry pick of #110256: kubelet: Mark ready condition as false explicitly for terminal pods#110481
k8s-ci-robot merged 2 commits into
kubernetes:release-1.22from
bobbypage:automated-cherry-pick-of-#110256-upstream-release-1.22

Conversation

@bobbypage

@bobbypage bobbypage commented Jun 9, 2022

Copy link
Copy Markdown
Member

Cherry pick of #110256 on release-1.22.

#110256: kubelet: Mark ready condition as false explicitly for terminal pods

For details on the cherry pick process, see the cherry pick requests page.

Fixed a 1.22 regression kubelet issue that could result in invalid pod status updates to be sent to the api-server where pods would be reported in a terminal phase but also report a ready condition of true in some cases.

Terminal pods may continue to report a ready condition of true because
there is a delay in reconciling the ready condition of the containers
from the runtime with the pod status. It should be invalid for kubelet
to report a terminal phase with a true ready condition. To fix the
issue, explicitly override the ready condition to false for terminal
pods during status updates.

Signed-off-by: David Porter <david@porter.me>
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Jun 9, 2022
@k8s-ci-robot k8s-ci-robot added do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet labels Jun 9, 2022
@k8s-ci-robot k8s-ci-robot requested review from Random-Liu and piosz June 9, 2022 06:30
@k8s-ci-robot k8s-ci-robot added area/test release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 9, 2022
@bobbypage

Copy link
Copy Markdown
Member Author

Cherrypick was not fully clean - manual conflict on the e2e test which was resolved manually.

$ git diff
diff --cc test/e2e_node/node_shutdown_linux_test.go
index 78d8c59362f,32ff9ccb549..00000000000
--- a/test/e2e_node/node_shutdown_linux_test.go
+++ b/test/e2e_node/node_shutdown_linux_test.go
@@@ -23,20 -24,28 +23,33 @@@ import
        "fmt"
        "os"
        "path/filepath"
 +      "strconv"
        "time"

 -      apierrors "k8s.io/apimachinery/pkg/api/errors"
        "k8s.io/apimachinery/pkg/fields"
+       "k8s.io/apimachinery/pkg/watch"
+       "k8s.io/client-go/tools/cache"
+       watchtools "k8s.io/client-go/tools/watch"
        "k8s.io/kubectl/pkg/util/podutils"
++<<<<<<< HEAD
++=======
+
+       admissionapi "k8s.io/pod-security-admission/api"
++>>>>>>> test: update graceful node shutdown e2e with watch

        "github.com/onsi/ginkgo"
        "github.com/onsi/gomega"
        "k8s.io/kubernetes/pkg/apis/scheduling"
 +      "k8s.io/kubernetes/pkg/features"
        "k8s.io/kubernetes/test/e2e/framework"

 -      "github.com/godbus/dbus/v5"
        v1 "k8s.io/api/core/v1"
 -      schedulingv1 "k8s.io/api/scheduling/v1"
        metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
++<<<<<<< HEAD
++=======
+       "k8s.io/apimachinery/pkg/util/wait"
+       "k8s.io/kubernetes/pkg/features"
++>>>>>>> test: update graceful node shutdown e2e with watch
        kubeletconfig "k8s.io/kubernetes/pkg/kubelet/apis/config"
        kubelettypes "k8s.io/kubernetes/pkg/kubelet/types"
        testutils "k8s.io/kubernetes/test/utils"
@@@ -119,14 -158,9 +160,9 @@@ var _ = SIGDescribe("GracefulNodeShutdo
                                framework.ExpectEqual(len(list.Items), len(pods), "the number of pods is not as expected")

                                for _, pod := range list.Items {
-                                       if isPodStatusAffectedByIssue108594(&pod) {
-                                               framework.Logf("Detected invalid pod state for pod %q: pod status: %+v", pod.Name, pod.Status)
-                                               framework.Failf("failing test due to detecting invalid pod status")
-                                       }
-
                                        if kubelettypes.IsCriticalPod(&pod) {
 -                                              if isPodShutdown(&pod) {
 -                                                      framework.Logf("Expecting critical pod to be running, but it's not currently. Pod: %q, Pod Status %+v", pod.Name, pod.Status)
 +                                              if pod.Status.Phase != v1.PodRunning {
 +                                                      framework.Logf("Expecting critcal pod to be running, but it's not currently. Pod: %q, Pod Status Phase: %q, Pod Status Reason: %q", pod.Name, pod.Status.Phase, pod.Status.Reason)
                                                        return fmt.Errorf("critical pod should not be shutdown, phase: %s", pod.Status.Phase)
                                                }
                                        } else {
@@@ -151,15 -185,10 +187,20 @@@
                                framework.ExpectEqual(len(list.Items), len(pods), "the number of pods is not as expected")

                                for _, pod := range list.Items {
++<<<<<<< HEAD
 +                                      if isPodStatusAffectedByIssue108594(&pod) {
 +                                              framework.Logf("Detected invalid pod state for pod %q: pod status: %+v", pod.Name, pod.Status)
 +                                              framework.Failf("failing test due to detecting invalid pod status")
 +                                      }
 +                                      if pod.Status.Phase != v1.PodFailed || pod.Status.Reason != "Shutdown" {
 +                                              framework.Logf("Expecting pod to be shutdown, but it's not currently: Pod: %q, Pod Status Phase: %q, Pod Status Reason: %q", pod.Name, pod.Status.Phase, pod.Status.Reason)
++=======
+                                       if !isPodShutdown(&pod) {
+                                               framework.Logf("Expecting pod to be shutdown, but it's not currently: Pod: %q, Pod Status %+v", pod.Name, pod.Status)
++>>>>>>> test: update graceful node shutdown e2e with watch
                                                return fmt.Errorf("pod should be shutdown, phase: %s", pod.Status.Phase)
                                        }
 +
                                }
                                return nil
                        },

@bobbypage

Copy link
Copy Markdown
Member Author

/triage accepted
/kind bug
/assign @mrunalp @rphillips @dchen1107 @thockin

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jun 9, 2022
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Jun 9, 2022
@bobbypage bobbypage changed the title Automated cherry pick of #110256: kubelet: Mark ready condition as false explicitly for Automated cherry pick of #110256: kubelet: Mark ready condition as false explicitly for terminal pods Jun 9, 2022
Use a watch to detect invalid pod status updates in graceful node
shutdown node e2e test. By using a watch, all pod updates will be
captured while the previous logic required polling the api-server which
could miss some intermediate updates.

Signed-off-by: David Porter <david@porter.me>
@rphillips

Copy link
Copy Markdown
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 9, 2022
@thockin

thockin commented Jun 9, 2022

Copy link
Copy Markdown
Member

/approve

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobbypage, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 9, 2022
@bobbypage

Copy link
Copy Markdown
Member Author

ping @kubernetes/release-managers

for cherrypick approvals.

Thank you!

@puerco

puerco commented Jun 9, 2022

Copy link
Copy Markdown
Member

Thanks @bobbypage !
/priority important-soon
/lgtm

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 9, 2022
@puerco puerco added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jun 9, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. label Jun 9, 2022
@k8s-ci-robot k8s-ci-robot merged commit 6de0e5a into kubernetes:release-1.22 Jun 9, 2022
@liggitt liggitt added the kind/regression Categorizes issue or PR as related to a regression from a prior release. label Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants