Skip to content

[release-4.13] OCPBUGS-19400: install: Recreate and delayed default ServiceAccount deletion#3923

Merged
openshift-merge-robot merged 1 commit into
openshift:release-4.13from
openshift-cherrypick-robot:cherry-pick-3920-to-release-4.13
Sep 22, 2023
Merged

[release-4.13] OCPBUGS-19400: install: Recreate and delayed default ServiceAccount deletion#3923
openshift-merge-robot merged 1 commit into
openshift:release-4.13from
openshift-cherrypick-robot:cherry-pick-3920-to-release-4.13

Conversation

@openshift-cherrypick-robot

Copy link
Copy Markdown

This is an automated cherry-pick of #3920

/assign yuqi-zhang

Single-node updates from 4.13 to 4.14 are having slow leader handoffs
during operator Deployment updates [1].  For example [2] in Loki:

  {invoker="openshift-internal-ci/periodic-ci-openshift-release-master-nightly-4.14-upgrade-from-stable-4.13-e2e-aws-upgrade-ovn-single-node/1692132139799154688"} | unpack | pod="machine-config-operator-74d7475455-cgcmg" |= "leaderelection"

gives:

  machine-config-operator-7987b6bcdd-6zvrt
    I0817 11:26:14.747630       1 leaderelection.go:248] attempting to acquire leader lease openshift-machine-config-operator/machine-config...
    I0817 11:26:14.756376       1 leaderelection.go:258] successfully acquired lease openshift-machine-config-operator/machine-config
    E0817 11:31:15.153289       1 leaderelection.go:330] error retrieving resource lock openshift-machine-config-operator/machine-config: the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps machine-config)
    E0817 11:37:42.204539       1 leaderelection.go:330] error retrieving resource lock openshift-machine-config-operator/machine-config: Get "https://172.30.0.1:443/api/v1/namespaces/openshift-machine-config-operator/configmaps/machine-config": net/http: TLS handshake timeout - error from a previous attempt: unexpected EOF
    E0817 11:53:05.221213       1 leaderelection.go:330] error retrieving resource lock openshift-machine-config-operator/machine-config: Get "https://172.30.0.1:443/api/v1/namespaces/openshift-machine-config-operator/configmaps/machine-config": net/http: TLS handshake timeout - error from a previous attempt: unexpected EOF
  machine-config-operator-74d7475455-cgcmg
    I0817 12:21:41.019562       1 leaderelection.go:122] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.
    I0817 12:21:41.036459       1 leaderelection.go:245] attempting to acquire leader lease openshift-machine-config-operator/machine-config...
  machine-config-operator-7987b6bcdd-6zvrt
    E0817 12:21:41.780276       1 leaderelection.go:306] Failed to release lock: configmaps "machine-config" is forbidden: User "system:serviceaccount:openshift-machine-config-operator:default" cannot update resource "configmaps" in API group "" in the namespace "openshift-machine-config-operator"
  machine-config-operator-74d7475455-cgcmg
    I0817 12:27:27.767746       1 leaderelection.go:255] successfully acquired lease openshift-machine-config-operator/machine-config
    I0817 12:35:29.649939       1 leaderelection.go:122] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.
    I0817 12:35:29.685549       1 leaderelection.go:245] attempting to acquire leader lease openshift-machine-config-operator/machine-config...
    I0817 12:35:29.696394       1 leaderelection.go:255] successfully acquired lease openshift-machine-config-operator/machine-config

That shows two lease handoffs:

1. During the update from 4.13 to 4.14, the cluster-version operator
   bumps the MCO Deployment to roll out the 4.14 image and other manifest
   changes.
   a. The outgoing 4.13 is using the default ServiceAccount, because
      it predates [3,4].
   b. The CVO deletes the default ServiceAccount, because its manifest
      filename sorts it before the MCO Deployment.
   c. The CVO bumps the MCO Deployment.
   d. The MCO Deployment defaults to RollingUpdate [5] with a default
      maxSurge of 25% rounded up [6].  That maxSurge default is very
      old, dating back to at least the version of Kubernetes vendored
      by OpenShift 4.1 [7].
   e. The Deployment controller surges the incoming pod, and at
      12:21:41.036459 it checks to see if the lease is available, but
      it isn't, because the outgoing pod is still running.
   f. The Deployment controller TERMs the outgoing containers, and at
      12:21:41.780276 the outgoing pod complains that it is no longer
      authorized to delete the lease ConfigMap, because the default
      ServiceAccount has been deleted in 1.b.
   g. The incoming pod acquires the lease 5m46s later at 12:27:27,
      when the outgoing container's final lease goes stale.
2. When the single node goes down for a reboot to roll out the 4.14
   RHCOS, and the new container coming up at 12:35:29 acquires the
   lease smoothly.

This commit improves the deployment rollout with two changes:

* Shifting the default ServiceAccount deletion to a 90 manifest moves
  it behind 0000_80_machine-config-operator_04_deployment.yaml and
  0000_80_machine-config-operator_06_clusteroperator.yaml.  Moving
  behind the ClusterOperator ensures that deletion is not initiated
  until the incoming operator pod has updated the ClusterOperator to
  confirm the update has completed.  Positioning between the
  Deployment and the ClusterOperator would have put the deletion after
  the CVO bumped the Deployment, but might still have raced against
  the outgoing operator pod actually shutting down.  This should allow
  the outgoing pod to gracefully remove the ConfigMap and Lease
  resources during graceful shutdowns.
* Shifting the Deployment to Recreate [8].  But also as discussed in
  [8]:

    Note: This will only guarantee Pod termination previous to
    creation for upgrades. If you upgrade a Deployment, all Pods of
    the old revision will be terminated immediately. Successful
    removal is awaited before any Pod of the new revision is
    created. If you manually delete a Pod, the lifecycle is controlled
    by the ReplicaSet and the replacement will be created immediately
    (even if the old Pod is still in a Terminating state). If you need
    an "at most" guarantee for your Pods, you should consider using a
    StatefulSet.

  Still, Recreate will avoid the current milliseconds of overlap
  between the incoming pod's first attempt to acquire the lease and
  the outgoing pod's attempt to release during Deployment rollouts.
  Other MCO container restarts and
  drains-for-multi-control-plane-rolls and such may still have racey
  overlap, but should be limited to the:

    Worst graceful lease acquisition is {26s}.

  latency (at least with the current 4.14 single-node lease
  configuration).

[1]: https://issues.redhat.com/browse/OCPBUGS-16905
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-upgrade-from-stable-4.13-e2e-aws-upgrade-ovn-single-node/1692132139799154688
[3]: https://issues.redhat.com/browse/OCPBUGS-10924
[4]: openshift@ace637f
[5]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy
[6]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#max-surge
[7]: openshift/api $ git --no-pager grep -B12 maxSurge origin/release-4.1 -- vendor/k8s.io/api/apps/v1/types.go
     origin/release-4.1:vendor/k8s.io/api/apps/v1/types.go-  // The maximum number of pods that can be scheduled above the desired number of
     origin/release-4.1:vendor/k8s.io/api/apps/v1/types.go-  // pods.
     origin/release-4.1:vendor/k8s.io/api/apps/v1/types.go-  // Value can be an absolute number (ex: 5) or a percentage of desired pods (ex: 10%).
     origin/release-4.1:vendor/k8s.io/api/apps/v1/types.go-  // This can not be 0 if MaxUnavailable is 0.
     origin/release-4.1:vendor/k8s.io/api/apps/v1/types.go-  // Absolute number is calculated from percentage by rounding up.
     origin/release-4.1:vendor/k8s.io/api/apps/v1/types.go-  // Defaults to 25%.
     origin/release-4.1:vendor/k8s.io/api/apps/v1/types.go-  // Example: when this is set to 30%, the new ReplicaSet can be scaled up immediately when
     origin/release-4.1:vendor/k8s.io/api/apps/v1/types.go-  // the rolling update starts, such that the total number of old and new pods do not exceed
     origin/release-4.1:vendor/k8s.io/api/apps/v1/types.go-  // 130% of desired pods. Once old pods have been killed,
     origin/release-4.1:vendor/k8s.io/api/apps/v1/types.go-  // new ReplicaSet can be scaled up further, ensuring that total number of pods running
     origin/release-4.1:vendor/k8s.io/api/apps/v1/types.go-  // at any time during the update is at most 130% of desired pods.
     origin/release-4.1:vendor/k8s.io/api/apps/v1/types.go-  // +optional
     origin/release-4.1:vendor/k8s.io/api/apps/v1/types.go:  MaxSurge *intstr.IntOrString `json:"maxSurge,omitempty" protobuf:"bytes,2,opt,name=maxSurge"
[8]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#recreate-deployment
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@openshift-cherrypick-robot: Jira Issue OCPBUGS-19357 has been cloned as Jira Issue OCPBUGS-19400. Will retitle bug to link to clone.
/retitle [release-4.13] OCPBUGS-19400: install: Recreate and delayed default ServiceAccount deletion

Details

In response to this:

This is an automated cherry-pick of #3920

/assign yuqi-zhang

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci Bot changed the title [release-4.13] OCPBUGS-19357: install: Recreate and delayed default ServiceAccount deletion [release-4.13] OCPBUGS-19400: install: Recreate and delayed default ServiceAccount deletion Sep 19, 2023
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 19, 2023
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-19400, which is invalid:

  • expected dependent Jira Issue OCPBUGS-19357 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), but it is MODIFIED instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This is an automated cherry-pick of #3920

/assign yuqi-zhang

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci Bot requested review from cgwalters and jkyros September 19, 2023 14:48

@yuqi-zhang yuqi-zhang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/label backport-risk-assessed

@openshift-ci openshift-ci Bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Sep 19, 2023
@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Sep 19, 2023
@openshift-ci

openshift-ci Bot commented Sep 19, 2023

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: openshift-cherrypick-robot, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 19, 2023
@sinnykumari

Copy link
Copy Markdown
Contributor

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Sep 20, 2023
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@sinnykumari: This pull request references Jira Issue OCPBUGS-19400, which is valid. The bug has been moved to the POST state.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.z) matches configured target version for branch (4.13.z)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
  • dependent bug Jira Issue OCPBUGS-19357 is in the state Verified, which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE))
  • dependent Jira Issue OCPBUGS-19357 targets the "4.14.0" version, which is one of the valid target versions: 4.14.0
  • bug has dependents

Requesting review from QA contact:
/cc @sergiordlr

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Sep 20, 2023
@openshift-ci openshift-ci Bot requested a review from sergiordlr September 20, 2023 19:39
@sergiordlr

Copy link
Copy Markdown
Contributor

/label cherry-pick-approved

@openshift-ci openshift-ci Bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Sep 21, 2023
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD b2b7a34 and 2 for PR HEAD 1cdb75f in total

@openshift-ci

openshift-ci Bot commented Sep 21, 2023

Copy link
Copy Markdown
Contributor

@openshift-cherrypick-robot: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 1cdb75f link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-aws-ovn-fips 1cdb75f link false /test e2e-aws-ovn-fips

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 21bec78 and 1 for PR HEAD 1cdb75f in total

@openshift-merge-robot openshift-merge-robot merged commit 9f4854d into openshift:release-4.13 Sep 22, 2023
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@openshift-cherrypick-robot: Jira Issue OCPBUGS-19400: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-19400 has been moved to the MODIFIED state.

Details

In response to this:

This is an automated cherry-pick of #3920

/assign yuqi-zhang

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot

Copy link
Copy Markdown
Contributor

Fix included in accepted release 4.13.0-0.nightly-2023-09-23-231516

@freedge

freedge commented Oct 15, 2023

Copy link
Copy Markdown

this only cherry picks 2 out of the 3 changes of https://github.com/openshift/machine-config-operator/pull/3920/files
default-account-openshift-machine-config-operator is still created through 0000_80_machine-config-operator_03_rbac.yaml and deleted through 0000_90_machine-config-operator_90_deletion.yaml

it that what you expected @yuqi-zhang ? I think this causes

Upgradeability condition failed (type='UpgradeableDeletesInProgress' reason='ResourceDeletesInProgress' message='Cluster minor level upgrades are not allowed while resource deletions are in progress; resources=clusterrolebinding "default-account-openshift-machine-config-operator"')

wking added a commit to wking/machine-config-operator that referenced this pull request Oct 16, 2023
ace637f (OCPBUGS-10924: Switch default SA to
machine-config-operator, 2023-06-23, openshift#3740) moved the 4.14
machine-config operator to a non-default ServiceAccount and
ClusterRoleBinding.  But 4.13 and earlier remain on the default
ServiceAccount.

1cdb75f (install: Recreate and delayed default ServiceAccount
deletion, 2023-09-19, openshift#3923, OCPBUGS-19400) brought Recreate logic
back to 4.13.14 [1] and later (good), but also brought back a 'delete'
manifest for the default ClusterRoleBinding, which leads to the 4.13
cluster-version operator fighting with itself over whether that
ClusterRoleBinding should exist (it should exist on 4.13).  For
example, [2] updates from 4.12.36 to 4.13.14, and has:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1707415968109563904/artifacts/e2e-aws-upgrade/clusterversion.json | jq -r '.items[].status.conditions[] | select(.type == "Upgradeable") | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'
  2023-09-28T17:09:41Z Upgradeable=False ResourceDeletesInProgress: Cluster minor level upgrades are not allowed while resource deletions are in progress; resources=clusterrolebinding "default-account-openshift-machine-config-operator"

By dropping the deletion manifest from 4.13, we avoid contention
between two manifests, and leave the default ClusterRoleBinding alone
until a later update to 4.14 will remove it.

[1]:  https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.13.14
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1707415968109563904
wking added a commit to wking/machine-config-operator that referenced this pull request Oct 16, 2023
ace637f (OCPBUGS-10924: Switch default SA to
machine-config-operator, 2023-06-23, openshift#3740) moved the 4.14
machine-config operator to a non-default ServiceAccount and
ClusterRoleBinding.  But 4.13 and earlier remain on the default
ServiceAccount.

1cdb75f (install: Recreate and delayed default ServiceAccount
deletion, 2023-09-19, openshift#3923, OCPBUGS-19400) brought Recreate logic
back to 4.13.14 [1] and later (good), but also brought back a 'delete'
manifest for the default ClusterRoleBinding, which leads to the 4.13
cluster-version operator fighting with itself over whether that
ClusterRoleBinding should exist (it should exist on 4.13) [2].  For
example, [3] updates from 4.12.36 to 4.13.14, and has:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1707415968109563904/artifacts/e2e-aws-upgrade/clusterversion.json | jq -r '.items[].status.conditions[] | select(.type == "Upgradeable") | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'
  2023-09-28T17:09:41Z Upgradeable=False ResourceDeletesInProgress: Cluster minor level upgrades are not allowed while resource deletions are in progress; resources=clusterrolebinding "default-account-openshift-machine-config-operator"

By dropping the deletion manifest from 4.13, we avoid contention
between two manifests, and leave the default ClusterRoleBinding alone
until a later update to 4.14 will remove it.

[1]: https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.13.14
[2]: https://issues.redhat.com/browse/OCPBUGS-10924
[3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1707415968109563904
wking added a commit to wking/machine-config-operator that referenced this pull request Oct 16, 2023
ace637f (OCPBUGS-10924: Switch default SA to
machine-config-operator, 2023-06-23, openshift#3740) moved the 4.14
machine-config operator to a non-default ServiceAccount and
ClusterRoleBinding.  But 4.13 and earlier remain on the default
ServiceAccount.

1cdb75f (install: Recreate and delayed default ServiceAccount
deletion, 2023-09-19, openshift#3923, OCPBUGS-19400) brought Recreate logic
back to 4.13.14 [1] and later (good), but also brought back a 'delete'
manifest for the default ClusterRoleBinding, which leads to the 4.13
cluster-version operator fighting with itself over whether that
ClusterRoleBinding should exist (it should exist on 4.13) [2].  For
example, [3] updates from 4.12.36 to 4.13.14, and has:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1707415968109563904/artifacts/e2e-aws-upgrade/clusterversion.json | jq -r '.items[].status.conditions[] | select(.type == "Upgradeable") | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message'
  2023-09-28T17:09:41Z Upgradeable=False ResourceDeletesInProgress: Cluster minor level upgrades are not allowed while resource deletions are in progress; resources=clusterrolebinding "default-account-openshift-machine-config-operator"

By dropping the deletion manifest from 4.13, we avoid contention
between two manifests, and leave the default ClusterRoleBinding alone
until a later update to 4.14 will remove it.

[1]: https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.13.14
[2]: https://issues.redhat.com/browse/OCPBUGS-21721
[3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1707415968109563904
@wking

wking commented Oct 16, 2023

Copy link
Copy Markdown
Member

this only cherry picks 2 out of the 3 changes...

Good catch! Fixing in #3983.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.