Pause Single Cluster Upgrade work until stable.#4257
Pause Single Cluster Upgrade work until stable.#4257markmandel merged 3 commits intoagones-dev:mainfrom
Conversation
igooch
left a comment
There was a problem hiding this comment.
It would make more sense to add a timeout and default to passing like something below, so that we retain the logs and dev can continue.
# Run the upgrade tests parallel, pass this step if any of the tests fail
- name: gcr.io/google.com/cloudsdktool/cloud-sdk
id: submit-upgrade-test-cloud-build
entrypoint: bash
args:
- -c
- "./build/e2e_upgrade_test.sh ${_BASE_VERSION} ${PROJECT_ID} || true"
waitFor:
- wait-to-become-leader
- push-upgrade-test
timeout: 3600s # 1h
|
Build Succeeded 🥳 Build Id: 5c82eddd-6534-4d3a-8a9a-206feedebecf The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version: |
Don't hate this idea. Looking at logs, it usually takes 20m to pass, so we happy with a 30m timeout? |
Yep, 30 min should work. |
|
Not to self, this would actually need to be: Otherwise the timeout command will fail the build. |
For ~6 months the upgrade CI has been flaky/broken, making it unreliable and slowing community contribution and overall project momentum. This change: - removes the build/push + submission steps for upgrade tests from cloudbuild.yaml to reduce noise and unblock CI reliability; - updates the upgrading guide to clearly state that in-place upgrades are on hiatus due to lack of reliable testing and were removed from CI, and recommends thorough testing (multi-cluster remains the recommended production strategy). We should re-enable upgrade tests once they can have someone dedicated to the workstream again, and they can run reliably and provide signal again. With CI being this unstable we can't actually guarantee this functionality actually works at this stage anyway, so I don't think there's any reason to keep it running in CI.
3b746ed to
e9a3a6a
Compare
|
Build Failed 😭 Build Id: 95f2f818-7cd3-4f82-8fc6-53db02752357 Status: FAILURE To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
e9a3a6a to
fffda34
Compare
|
Flakingess in counter scripts on autopilot cluster - pod went unhelathy?: |
|
Build Succeeded 🥳 Build Id: 038006cd-9162-41cb-b057-01dbc9ed6022 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version: |
|
Should be good to go 🤞🏻 |
|
Build Succeeded 🥳 Build Id: d37de1a5-4d90-42a7-b02c-3ff8f632ade5 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version: |
What type of PR is this?
/kind cleanup
What this PR does / Why we need it:
For ~6 months the upgrade CI has been flaky/broken, making it unreliable and slowing community contribution and overall project momentum.
This change:
Which issue(s) this PR fixes:
N/A
Special notes for your reviewer:
We should re-enable upgrade tests once they can have someone dedicated to the workstream again, and they can run reliably and provide signal again.
With CI being this unstable we can't actually guarantee this functionality actually works at this stage anyway, so I don't think there's any reason to keep it running in CI.