ci: retry Docker Buildx setup on transient failure#21735
Merged
Conversation
lystopad
approved these changes
Jun 10, 2026
lystopad
left a comment
Member
There was a problem hiding this comment.
Interesting, let's see how it will help.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the Kurtosis Assertoor CI workflow against transient Docker Hub connectivity issues by adding a one-time retry around the docker/setup-buildx-action step (which pulls moby/buildkit during bootstrapping).
Changes:
- Make the first “Set up Docker Buildx” attempt
continue-on-errorand record its outcome via a stepid. - Add a conditional second Buildx setup attempt that runs only if the first attempt failed.
- Preserve failing behavior when both attempts fail (the retry step is not
continue-on-error).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pull Bot
pushed a commit
to Dustin4444/erigon
that referenced
this pull request
Jun 11, 2026
…rigontech#21741) ## Why `docker/setup-buildx-action` boots BuildKit by pulling `moby/buildkit` from Docker Hub on every run — the last uncached Docker Hub dependency in the kurtosis jobs. In merge-queue run [27280175556](https://github.com/erigontech/erigon/actions/runs/27280175556/job/80572198019) that pull timed out (`Get "https://registry-1.docker.io/v2/": context deadline exceeded`), failing the CI Gate for erigontech#21723 — ironically the PR closing the equivalent gap for the kurtosis engine-bootstrap images. The same Docker Hub connectivity window took out three other CI Gate runs that morning at the `kurtosis engine start` step. erigontech#21735 added a retry around buildx setup, but a retry doesn't survive an outage longer than its window. This removes the hard dependency the same way as the rest of the image-caching series (erigontech#21695, erigontech#21703, erigontech#21723). ## What - Pin the BuildKit image as `BUILDKIT_IMAGE: moby/buildkit:v0.30.0` (what the moving `buildx-stable-1` tag currently resolves to) and pass it to `setup-buildx-action` via `driver-opts: image=...`, in both `test-kurtosis-assertoor.yml` (`build-erigon-image`) and `test-kurtosis-gloas.yml` (`gloas_test`). Bump alongside the other pinned images. - Cache it under a single shared key (`docker-buildkit-<image>`) with the established docker save/load + actions/cache pattern, and `docker load` it before buildx setup. buildx's docker-container driver falls back to a locally present image when its pull fails ("pulling failed, using local image"), so with a warm cache the builder boots even while Docker Hub is fully unreachable. Pinning via driver-opts is what makes the fallback engage — the local image name must match what buildx wants to boot. - The cache-fill pull in the test jobs is best-effort (`continue-on-error`, save gated on pull success): buildx pulls the image itself either way, so a failed seed must not fail an otherwise-good run, and a failed pull never poisons the cache key with an empty archive. - Warm jobs (`warm-third-party-images` in both files) pull strictly and save the same key — producing the cache is their purpose. Both cache-warming workflows already path-filter on the edited files, so the cache is created on main right after this merges and refreshed daily against LRU eviction. In gloas, buildx setup previously ran before any caching; the buildkit cache steps are inserted ahead of it. Not covered (non-gating, can follow up if wanted): `ci-cd-main-branch-docker-images.yml` and `release.yml` also use `setup-buildx-action` but don't block PRs or the merge queue. actionlint is clean (the two SC2086 infos it reports pre-exist on main in the kurtosis CLI install step).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In merge-queue run 27280175556
kurtosis / build-erigon-imagefailed in the "Set up Docker Buildx" step: booting BuildKit pullsmoby/buildkit:buildx-stable-1from Docker Hub, and the pull timed out (Get "https://registry-1.docker.io/v2/": context deadline exceeded), failing the CI Gate for #21723.The job already retries the erigon image build on transient failures, but the buildx setup step runs before that retry and wasn't covered. This applies the same pattern: the first setup attempt is
continue-on-error, and a second attempt runs only if the first failed. Both attempts failing still fails the job.