ci: split docker-publish per-arch runners + cache-friendly dockerfile layers#22080
Merged
Conversation
Contributor
🔎 Lint report:
|
Build amd64 and arm64 natively on their own GitHub runners in parallel, then stitch the per-arch digests into a tagged multi-arch manifest. Replaces the previous single-runner pattern which rebuilt arm64 from scratch on every run because QEMU emulation + unscoped GHA cache meant no layer reuse across invocations. Jobs: build-amd64 — ubuntu-latest, native, runs smoke tests, pushes by digest build-arm64 — ubuntu-24.04-arm, native (no QEMU), pushes by digest merge — stitches both digests into :sha-<sha> (main) or :<release> move-latest — unchanged ancestor-check logic, now needs: merge Preserved: - per-commit sha-<sha> tags on main (immutable, race-free) - org.opencontainers.image.revision label on each per-arch image - dashboard subcommand smoke test (#9153 guard) - race-safe :latest advancement via move-latest - top-level cancel-in-progress: false Changed behavior: - move-latest flipped to cancel-in-progress: false for defense-in-depth. Top-level concurrency already serializes runs for the ref, so the old cancel=true on move-latest was dead code. Flipping to false prevents any starvation mode if top-level is ever loosened. Cache scopes separated per-arch (scope=docker-amd64 / scope=docker-arm64) so the two runners don't clobber each other in the gha cache backend.
Before this change, `uv pip install -e ".[all]"` ran AFTER `COPY . .`, so every commit that changed any .py file busted the layer cache and re-did the entire Python dep resolve + wheel download + native extension compile (~4-5 min on cold Docker Hub cache). Split it into two steps: 1. Before `COPY . .`: copy only pyproject.toml + uv.lock + README.md, then `uv sync --frozen --no-install-project --all-extras`. This layer is cached unless any of those three files change, so .py-only commits skip the heavy work entirely. 2. After `COPY . .` (and its downstream chmod/chown step): run `uv pip install --no-cache-dir --no-deps -e .` to create the editable link. With --no-deps this is a ~1s op — no resolution, no downloads, no compilation. Combined with the per-arch runner split in the previous commit, this should drop cache-hit build times to the sub-5-min range.
Runs `uv lock --check` on every PR and on push to main that touches pyproject.toml, uv.lock, or this workflow itself. Exits non-zero if the lockfile is out of sync with pyproject.toml, blocking the PR before it can break the Docker build on main. Rationale: the new Dockerfile layout uses `uv sync --frozen --extra all`, which rejects stale lockfiles. Without this guard, a PR that changes pyproject.toml dependencies but forgets to regenerate uv.lock would merge fine and then break docker-publish on main (visible only after ~15 min of build time, producing no image). On failure, the step adds a GitHub annotation and a workflow summary block with the exact commands to run locally (`uv lock`, `git add uv.lock`, `git commit`). Verified locally that: - Clean tree: `uv lock --check` succeeds (resolves in ~2ms, no work). - Stale lockfile (added cowsay to pyproject.toml, not in lock): exits 1 with message 'The lockfile at `uv.lock` needs to be updated'.
Adds `pull_request` trigger to docker-publish.yml so PRs that touch Dockerfile / docker/ / pyproject.toml / uv.lock / the workflow itself verify the image builds cleanly before merge. Previously, Dockerfile regressions (e.g. a stale uv.lock, a typo'd dep) would only surface after merge when the docker-publish workflow ran on main. Build-verify-only on PRs: the per-arch jobs run their `load: true` build + smoke test, but the push-by-digest + artifact upload steps remain gated on push-to-main or release. The `merge` and `move-latest` jobs stay excluded from PRs by their existing `if:` gates, so :latest and SHA tags are never touched from PR runs. Concurrency: PR runs use a PR-scoped group (`docker-<pr_number>`) with `cancel-in-progress: true` so rapid pushes to the same PR collapse to the latest commit. Push/release runs keep `cancel-in-progress: false` — every merge still gets its own SHA-tagged image. Also adds arm64 smoke tests (previously amd64-only): the image is now built with `load: true` on arm64 too, then `docker run --help` + `dashboard --help` smoke tests run identically on both arches. Both smoke test blocks were extracted into a new composite action at `.github/actions/hermes-smoke-test` to keep the two jobs DRY. New files: - .github/actions/hermes-smoke-test/action.yml Modified: - .github/workflows/docker-publish.yml
68b9c02 to
93679ef
Compare
JinyuID
pushed a commit
to JinyuID/hermes-agent
that referenced
this pull request
May 11, 2026
…cker ci: split docker-publish per-arch runners + cache-friendly dockerfile layers
jsboige
pushed a commit
to jsboige/hermes-agent
that referenced
this pull request
May 14, 2026
…cker ci: split docker-publish per-arch runners + cache-friendly dockerfile layers
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
May 25, 2026
…cker ci: split docker-publish per-arch runners + cache-friendly dockerfile layers
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…cker ci: split docker-publish per-arch runners + cache-friendly dockerfile layers
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…cker ci: split docker-publish per-arch runners + cache-friendly dockerfile layers
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cuts Docker Hub publish time from ~40 min to ~3 min on warm cache (and ~13 min on cold cache) by splitting the per-arch builds onto native runners and restructuring the Python dep install into a cache-friendly layer.
Before: one
ubuntu-latestjob built both arches via QEMU emulation. Every main push took 38-45 min, with arm64 eating ~80% of the wall clock because it ran under emulation and shared a gha cache scope with amd64, so the two arches clobbered each other's layer cache between runs.After: three jobs run in parallel —
build-amd64onubuntu-latest,build-arm64onubuntu-24.04-arm(GitHub's free native arm64 runner, no QEMU), andmergethat stitches the per-arch digests into a single multi-arch manifest usingdocker buildx imagetools create. Cache scopes are separated per-arch (scope=docker-amd64/scope=docker-arm64), and the Dockerfile's Python dep install was hoisted aboveCOPY . .so source-only commits skip the ~4-5 min dep resolve entirely.All existing safety behavior is preserved: per-commit
sha-<sha>tags, theorg.opencontainers.image.revisionOCI label, the dashboard subcommand smoke test (#9153 regression guard), and the race-safe:latestadvancement via themove-latestjob.Related Issue
Fixes #
Type of Change
Changes Made
.github/workflows/docker-publish.yml— replaced the singlebuild-and-pushjob with four:build-amd64(native, runs smoke tests + dashboard--helpregression guard, pushes by digest),build-arm64(native onubuntu-24.04-arm, pushes by digest),merge(stitches digests into:sha-<sha>on main or:<release_tag>on release), andmove-latest(unchanged ancestor-check logic, now gated onneeds: merge). Cache scoped per-arch. Top-levelcancel-in-progress: falsepreserved..github/workflows/docker-publish.yml— flippedmove-latest's own concurrency tocancel-in-progress: falsefor defense-in-depth. The top-level concurrency group already serializes runs for the ref, so the oldcancel=trueon move-latest was dead code; if top-level is ever loosened, queued move-latests will now run serially in arrival order instead of cancelling each other. Updated the comment block to describe the real serialization source honestly.Dockerfile— split the Python dep install into a cached layer aboveCOPY . .. Before:uv pip install -e ".[all]"ran afterCOPY . ., so every .py change re-resolved ~258 packages. After:uv sync --frozen --no-install-project --extra allruns on justpyproject.toml+uv.lock, thenuv pip install --no-cache-dir --no-deps -e "."creates the editable link in ~1s after the source copy. Uses--extra all(the composite extra intended for production) rather than--all-extras(would pull in[rl],[yc-bench],[termux-all]— git-cloned RL libs, benchmarks, Android redundancy that don't belong in the published image)..github/workflows/uv-lockfile-check.yml— new blocking CI check that runsuv lock --checkon PRs touchingpyproject.toml/uv.lock. Since the Docker build now usesuv sync --frozen, a stale lockfile would fail the docker-publish workflow on main ~15 min into the build with no published image. This check catches that in ~10s at PR time, with a step summary telling the dev exactly which commands to run locally to fix it.uv.lock— refreshed to matchpyproject.toml(separate commit, pre-existing drift picked up by the new check).How to Test
Verified via five manual
workflow_dispatchruns on this branch (a temporary dispatch trigger +dryrun-<sha>tag scheme was used during development; both were dropped from the final history). All five runs succeeded end-to-end, produced a valid multi-arch manifest, and correctly skippedmove-latest(workflow_dispatch can't touch:latest— triple-gated viaevent_name == 'push'+ref == 'refs/heads/main'+ thepushed_sha_tagoutput which only gets set on push-to-main).--all-extras--extra allfix, coldRun 3 surfaced the
--all-extrasbloat bug — caught in dry-run before merge. Run 5 is the target steady state: on a source-only commit (no pyproject.toml change, cache populated), the whole pipeline finishes in ~3 minutes.Post-merge verification steps:
pushto main that triggers this workflow. Confirm total wall clock is in the 12-18 min range on cold cache (new cache scopes will be empty at first).:latestpoints at the merge commit:uv-lockfile-checkjob by opening a throwaway PR that adds a dep topyproject.tomlwithout regeneratinguv.lock. The check should fail with a clear step summary.Checklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests pass — N/A (CI-only change, no Python runtime code touched; the test suite doesn't exercise GitHub Actions workflows)ubuntu-latest+ubuntu-24.04-armrunners (verified viaworkflow_dispatchon this branch)Documentation & Housekeeping
docs/, docstrings) — N/A, workflow and Dockerfile comments are thoroughcli-config.yaml.exampleif I added/changed config keys — N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — N/A (no architectural change; CI workflow modification only)Screenshots / Logs
Dry-run workflow runs on this branch (
workflow_dispatchtrigger +dryrun-<sha>tag scheme dropped from final history):--all-extras(caught in dry-run)--extra allfix, cold--extra allfix, warmMulti-arch manifest from run 4 (pre-squash dryrun tag, same schema production will produce):
Both
linux/amd64andlinux/arm64sub-manifests are present, plus SLSA build attestations for each.Note: a handful of
dryrun-<sha>tags exist on Docker Hub from the dry runs. They're immutable digest-addressed images, harmless to leave but safe to delete after merge if desired.