Migrate all but one of the other jobs to use cncf-hosted gha runners#17943
Migrate all but one of the other jobs to use cncf-hosted gha runners#17943rohit-nayak-ps merged 1 commit intovitessio:mainfrom
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
d2a6b68 to
ca26096
Compare
9f48de2 to
c8be9f3
Compare
7e67ab9 to
d3fba07
Compare
|
@jeefy, the errors on the cluster workflows you see here are also happening on our other PRs, for example https://github.com/vitessio/vitess/actions/runs/13862724676/job/38795933540?pr=17967. I tried switching one to the We are currently blocked on this one and upcoming urgent PRs due today. I was thinking of temporarily switching back to the |
|
If possible I'd like another 12 hours of debugging things this way. I'll submit a PR to revert back to the hosted GH runners if I haven't resolved the issue by tonight and resume work next week. Thanks for your patience during this. |
I missed this. Go ahead and revert, I should be able to test this with a different PR. |
7faaac5 to
987057f
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #17943 +/- ##
==========================================
- Coverage 67.56% 67.55% -0.02%
==========================================
Files 1597 1597
Lines 259763 259845 +82
==========================================
+ Hits 175506 175526 +20
- Misses 84257 84319 +62 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
So the jobs in the prior PR should be stable and not blocking anything anymore. It seems like this PR is mostly there, although the race-centric tests seem to fail at very inconsistent points. If these tests have harder performance requirements I'll bump up the resources for them. |
|
@rohit-nayak-ps Is there any issue with these race-specific tests running in a container? I cannot fathom why they're failing and I'm trying not to dive too deep into this. |
0b2f0a8 to
345ecc3
Compare
Signed-off-by: Jeffrey Sica <me@jeefy.dev>
|
Removed the changes for unit test e2e race tests. Everything else looks solid, so this is g2g. I'll work on migrating that final test at a later date. Thanks! |
|
Ping! :) |
frouioui
left a comment
There was a problem hiding this comment.
The code looks good to me. I am restarting some workflows to ensure there is no flakiness, or not more than before: https://github.com/vitessio/vitess/actions/runs/13884728877?pr=17943.
| timeout-minutes: 60 | ||
| name: Run Semi Sync Upgrade Downgrade Test | ||
| runs-on: gh-hosted-runners-16cores-1-24.04 | ||
| runs-on: oracle-16cpu-64gb-x86-64 |
There was a problem hiding this comment.
Is there an alias that also guarantees a specific distro / version? We've been bitten many times before by floating names when they upgraded and we had to immediately rush and drop everything else to fix CI.
We had explicit distro versions before so we could upgrade at a specific moment of our choice and now when these runner pools are updated. It also allows for updating in a separate PR while not being blocked anywhere else.
There was a problem hiding this comment.
We didn't set up a specific alias for distro, but our intent is for this to ONLY mirror ubuntu-latest unless/until we hit a specific instance of needing an older ubuntu distro.
The runner image used is defined here: https://github.com/cncf/automation/tree/main/ci/gha-runner-image
There was a problem hiding this comment.
We didn't set up a specific alias for distro, but our intent is for this to ONLY mirror
ubuntu-latestunless/until we hit a specific instance of needing an older ubuntu distro.
Right, but that's what explicitly caused us pain every time it was updated in the past. The biggest problem is not even that we don't want to upgrade, it's that it blocks us from even pro-actively upgrading to avoid the problems in the first place. So we have no way to deal with it and are forced to let things break and then have to rush to fix it 😢.
So it's not that we need an older distro, it's that we need the explicit version so that we can control when to upgrade (which is often before -latest would change).
There was a problem hiding this comment.
@jeefy I don't want to hard block on this btw, but it's something that ideally we do have some solution for at some point. If that's more when Ubuntu 26.04 would be out next year that's also good but ideally there's something before then 😄.
There was a problem hiding this comment.
I'll update our container builds/tags to ensure the runners can be pinned to specific os/versions. That'll come with a follow-up PR, probably post-KubeCon.
Spit-balling, it'll probably be oracle-16cpu-64gb-x86-64-24.04 vs oracle-16cpu-64gb-x86-64
Sorry, haven't had the time to look at that. I don't think there should be an issue, but we have had tests being flaky due to performance of the underlying hardware for some our e2e and unit tests. Once this PR is merged, maybe you can create a PR porting the remaining tests and we can take a look at identifying the root cause of the race-test failures. |
Description
Finishing the work from #17879
CNCF has hosted ephemeral GitHub runners in Oracle that we're wanting projects to use rather than the GitHub hosted ones, which are now incur a cost to use.
This PR is currently a WIPto work through any tests that break or dependencies that may be missing. <3There is one final test that was not migrated, which was the Unit Test Race. That will be migrated at a later date to unblock the rest of these.
Please direct any questions to myself, @krook and @RobertKielty
Related Issue(s)
Checklist
Deployment Notes