OCPBUGS-10924: Switch default SA to machine-config-operator#3740
Conversation
|
@cdoern: This pull request references Jira Issue OCPBUGS-10924, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/jira refresh |
|
@cdoern: This pull request references Jira Issue OCPBUGS-10924, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cc @sergiordlr |
|
/hold |
3a4f1f1 to
37a0aba
Compare
|
/retest-required |
The following test cases were executed and passed: "[sig-mco] MCO Author:sregidor-NonPreRelease-High-45239-KubeletConfig has a limit of 10 per cluster [Disruptive] [Serial]"
Before upgrade: After upgrade: The clusterrolebindings are duplicated: I don't know if it is related to the problem, but we can see this in the CVO pod logs I will add a must-gather in a comment in the jira issue. default-account-openshift-machine-config-operator clusterrolebinding was not removed, hence we can't add the qe-approved label. |
create the new account, tombstone the old one, and update all references. Signed-off-by: Charlie Doern <cdoern@redhat.com>
Only custom-account-openshift-machine-config-operator clusterrolebinding is created (we don't create default-account-openshift-machine-config-operator) The following test cases were executed and passed: "[sig-mco] MCO Author:sregidor-NonPreRelease-High-45239-KubeletConfig has a limit of 10 per cluster [Disruptive] [Serial]"
Before upgrade the default SA is used by machine-config-operator pod: The upgrade is executed without problems After upgrade: The clusterrolebinding for the default account is removed, and a new one is created for the custom account. The operator pod is not using the default SA: After the upgrade we could execute "[sig-mco] MCO Author:sregidor-Longduration-NonPreRelease-High-47045-Config Drift. Compressed files. [Serial]" test case, and the test case passed. We can add the qe-approved label /label qe-approved |
|
/hold cancel |
|
/retest-required |
|
@cdoern: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest-required |
|
/lgtm Thanks for the fix - let's pray to the hypershift gods 🙏 |
|
Strange, it applied approved but not lgtm 🤔 |
|
/lgtm |
|
perhaps some issue with bot |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cdoern, djoshy, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@cdoern: Jira Issue OCPBUGS-10924: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-10924 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
ace637f (OCPBUGS-10924: Switch default SA to machine-config-operator, 2023-06-23, openshift#3740) moved the 4.14 machine-config operator to a non-default ServiceAccount and ClusterRoleBinding. But 4.13 and earlier remain on the default ServiceAccount. 1cdb75f (install: Recreate and delayed default ServiceAccount deletion, 2023-09-19, openshift#3923, OCPBUGS-19400) brought Recreate logic back to 4.13.14 [1] and later (good), but also brought back a 'delete' manifest for the default ClusterRoleBinding, which leads to the 4.13 cluster-version operator fighting with itself over whether that ClusterRoleBinding should exist (it should exist on 4.13). For example, [2] updates from 4.12.36 to 4.13.14, and has: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1707415968109563904/artifacts/e2e-aws-upgrade/clusterversion.json | jq -r '.items[].status.conditions[] | select(.type == "Upgradeable") | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2023-09-28T17:09:41Z Upgradeable=False ResourceDeletesInProgress: Cluster minor level upgrades are not allowed while resource deletions are in progress; resources=clusterrolebinding "default-account-openshift-machine-config-operator" By dropping the deletion manifest from 4.13, we avoid contention between two manifests, and leave the default ClusterRoleBinding alone until a later update to 4.14 will remove it. [1]: https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.13.14 [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1707415968109563904
ace637f (OCPBUGS-10924: Switch default SA to machine-config-operator, 2023-06-23, openshift#3740) moved the 4.14 machine-config operator to a non-default ServiceAccount and ClusterRoleBinding. But 4.13 and earlier remain on the default ServiceAccount. 1cdb75f (install: Recreate and delayed default ServiceAccount deletion, 2023-09-19, openshift#3923, OCPBUGS-19400) brought Recreate logic back to 4.13.14 [1] and later (good), but also brought back a 'delete' manifest for the default ClusterRoleBinding, which leads to the 4.13 cluster-version operator fighting with itself over whether that ClusterRoleBinding should exist (it should exist on 4.13) [2]. For example, [3] updates from 4.12.36 to 4.13.14, and has: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1707415968109563904/artifacts/e2e-aws-upgrade/clusterversion.json | jq -r '.items[].status.conditions[] | select(.type == "Upgradeable") | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2023-09-28T17:09:41Z Upgradeable=False ResourceDeletesInProgress: Cluster minor level upgrades are not allowed while resource deletions are in progress; resources=clusterrolebinding "default-account-openshift-machine-config-operator" By dropping the deletion manifest from 4.13, we avoid contention between two manifests, and leave the default ClusterRoleBinding alone until a later update to 4.14 will remove it. [1]: https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.13.14 [2]: https://issues.redhat.com/browse/OCPBUGS-10924 [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1707415968109563904
ace637f (OCPBUGS-10924: Switch default SA to machine-config-operator, 2023-06-23, openshift#3740) moved the 4.14 machine-config operator to a non-default ServiceAccount and ClusterRoleBinding. But 4.13 and earlier remain on the default ServiceAccount. 1cdb75f (install: Recreate and delayed default ServiceAccount deletion, 2023-09-19, openshift#3923, OCPBUGS-19400) brought Recreate logic back to 4.13.14 [1] and later (good), but also brought back a 'delete' manifest for the default ClusterRoleBinding, which leads to the 4.13 cluster-version operator fighting with itself over whether that ClusterRoleBinding should exist (it should exist on 4.13) [2]. For example, [3] updates from 4.12.36 to 4.13.14, and has: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1707415968109563904/artifacts/e2e-aws-upgrade/clusterversion.json | jq -r '.items[].status.conditions[] | select(.type == "Upgradeable") | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2023-09-28T17:09:41Z Upgradeable=False ResourceDeletesInProgress: Cluster minor level upgrades are not allowed while resource deletions are in progress; resources=clusterrolebinding "default-account-openshift-machine-config-operator" By dropping the deletion manifest from 4.13, we avoid contention between two manifests, and leave the default ClusterRoleBinding alone until a later update to 4.14 will remove it. [1]: https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.13.14 [2]: https://issues.redhat.com/browse/OCPBUGS-21721 [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1707415968109563904
OCPBUGS-10924: Switch default SA to machine-config-operator
create the new account, tombstone the old one, and update all references.
All tests should work the same as proof that this change does not impact functionality.