Skip to content

ci: cache buildkit image so buildx setup survives Docker Hub outages#21741

Merged
taratorio merged 1 commit into
mainfrom
taratorio/cache-buildkit-image
Jun 11, 2026
Merged

ci: cache buildkit image so buildx setup survives Docker Hub outages#21741
taratorio merged 1 commit into
mainfrom
taratorio/cache-buildkit-image

Conversation

@taratorio

Copy link
Copy Markdown
Member

Why

docker/setup-buildx-action boots BuildKit by pulling moby/buildkit from Docker Hub on every run — the last uncached Docker Hub dependency in the kurtosis jobs. In merge-queue run 27280175556 that pull timed out (Get "https://registry-1.docker.io/v2/": context deadline exceeded), failing the CI Gate for #21723 — ironically the PR closing the equivalent gap for the kurtosis engine-bootstrap images. The same Docker Hub connectivity window took out three other CI Gate runs that morning at the kurtosis engine start step.

#21735 added a retry around buildx setup, but a retry doesn't survive an outage longer than its window. This removes the hard dependency the same way as the rest of the image-caching series (#21695, #21703, #21723).

What

  • Pin the BuildKit image as BUILDKIT_IMAGE: moby/buildkit:v0.30.0 (what the moving buildx-stable-1 tag currently resolves to) and pass it to setup-buildx-action via driver-opts: image=..., in both test-kurtosis-assertoor.yml (build-erigon-image) and test-kurtosis-gloas.yml (gloas_test). Bump alongside the other pinned images.
  • Cache it under a single shared key (docker-buildkit-<image>) with the established docker save/load + actions/cache pattern, and docker load it before buildx setup. buildx's docker-container driver falls back to a locally present image when its pull fails ("pulling failed, using local image"), so with a warm cache the builder boots even while Docker Hub is fully unreachable. Pinning via driver-opts is what makes the fallback engage — the local image name must match what buildx wants to boot.
  • The cache-fill pull in the test jobs is best-effort (continue-on-error, save gated on pull success): buildx pulls the image itself either way, so a failed seed must not fail an otherwise-good run, and a failed pull never poisons the cache key with an empty archive.
  • Warm jobs (warm-third-party-images in both files) pull strictly and save the same key — producing the cache is their purpose. Both cache-warming workflows already path-filter on the edited files, so the cache is created on main right after this merges and refreshed daily against LRU eviction.

In gloas, buildx setup previously ran before any caching; the buildkit cache steps are inserted ahead of it.

Not covered (non-gating, can follow up if wanted): ci-cd-main-branch-docker-images.yml and release.yml also use setup-buildx-action but don't block PRs or the merge queue.

actionlint is clean (the two SC2086 infos it reports pre-exist on main in the kurtosis CLI install step).

docker/setup-buildx-action boots BuildKit by pulling moby/buildkit from
Docker Hub on every run — the last uncached Docker Hub dependency in the
kurtosis jobs. In merge-queue run 27280175556 that pull timed out
(`Get "https://registry-1.docker.io/v2/": context deadline exceeded`),
failing the CI Gate. #21735 added a retry, but a retry doesn't survive
an outage longer than its window.

Pin the BuildKit image (buildx-stable-1 currently resolves to v0.30.0)
via driver-opts, cache it with the established docker save/load +
actions/cache pattern, and pre-load it before buildx setup. buildx falls
back to a locally present image when its pull fails, so with a warm
cache the builder boots even while Docker Hub is unreachable.

The cache-fill pull in the test jobs is best-effort (continue-on-error):
buildx pulls the image itself either way, so a failed seed must not fail
an otherwise-good run. Warm jobs pull strictly — producing the cache is
their purpose. Both warming workflows already path-filter on the edited
files, so the cache is created on main right after merge and refreshed
daily against LRU eviction. One key (docker-buildkit-<image>) is shared
by the assertoor and gloas workflows.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces CI flakiness caused by Docker Hub outages during docker/setup-buildx-action by pinning the BuildKit image and caching it so Buildx can bootstrap from a locally loaded copy when registry pulls fail.

Changes:

  • Pin BuildKit to moby/buildkit:v0.30.0 and pass it to docker/setup-buildx-action via driver-opts: image=... (assertoor + gloas).
  • Add actions/cache-backed docker save/load caching for the BuildKit image under a shared docker-buildkit-<image> key, with best-effort cache seeding in test jobs.
  • Extend the existing default-branch cache-warming workflows/documentation to include the new BuildKit cache.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
.github/workflows/test-kurtosis-gloas.yml Adds pinned BuildKit image, restores/loads it before Buildx setup, and warms/saves a shared BuildKit cache key.
.github/workflows/test-kurtosis-assertoor.yml Adds pinned BuildKit image, restores/loads it before Buildx setup (including retry path), and warms/saves a shared BuildKit cache key.
.github/workflows/cache-warming-kurtosis-gloas-images.yml Updates workflow documentation to reflect warming both third-party and BuildKit caches.
.github/workflows/cache-warming-kurtosis-cl-images.yml Updates workflow documentation to reflect warming both third-party and BuildKit caches.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@taratorio taratorio added this pull request to the merge queue Jun 11, 2026
Merged via the queue into main with commit e56ef04 Jun 11, 2026
93 checks passed
@taratorio taratorio deleted the taratorio/cache-buildkit-image branch June 11, 2026 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants