ci: split docker-publish per-arch runners + cache-friendly dockerfile layers by ethernet8023 · Pull Request #22080 · NousResearch/hermes-agent

ethernet8023 · 2026-05-08T21:16:12Z

Cuts Docker Hub publish time from ~40 min to ~3 min on warm cache (and ~13 min on cold cache) by splitting the per-arch builds onto native runners and restructuring the Python dep install into a cache-friendly layer.

Before: one ubuntu-latest job built both arches via QEMU emulation. Every main push took 38-45 min, with arm64 eating ~80% of the wall clock because it ran under emulation and shared a gha cache scope with amd64, so the two arches clobbered each other's layer cache between runs.

After: three jobs run in parallel — build-amd64 on ubuntu-latest, build-arm64 on ubuntu-24.04-arm (GitHub's free native arm64 runner, no QEMU), and merge that stitches the per-arch digests into a single multi-arch manifest using docker buildx imagetools create. Cache scopes are separated per-arch (scope=docker-amd64 / scope=docker-arm64), and the Dockerfile's Python dep install was hoisted above COPY . . so source-only commits skip the ~4-5 min dep resolve entirely.

All existing safety behavior is preserved: per-commit sha-<sha> tags, the org.opencontainers.image.revision OCI label, the dashboard subcommand smoke test (#9153 regression guard), and the race-safe :latest advancement via the move-latest job.

Related Issue

Fixes #

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

.github/workflows/docker-publish.yml — replaced the single build-and-push job with four: build-amd64 (native, runs smoke tests + dashboard --help regression guard, pushes by digest), build-arm64 (native on ubuntu-24.04-arm, pushes by digest), merge (stitches digests into :sha-<sha> on main or :<release_tag> on release), and move-latest (unchanged ancestor-check logic, now gated on needs: merge). Cache scoped per-arch. Top-level cancel-in-progress: false preserved.
.github/workflows/docker-publish.yml — flipped move-latest's own concurrency to cancel-in-progress: false for defense-in-depth. The top-level concurrency group already serializes runs for the ref, so the old cancel=true on move-latest was dead code; if top-level is ever loosened, queued move-latests will now run serially in arrival order instead of cancelling each other. Updated the comment block to describe the real serialization source honestly.
Dockerfile — split the Python dep install into a cached layer above COPY . .. Before: uv pip install -e ".[all]" ran after COPY . ., so every .py change re-resolved ~258 packages. After: uv sync --frozen --no-install-project --extra all runs on just pyproject.toml + uv.lock, then uv pip install --no-cache-dir --no-deps -e "." creates the editable link in ~1s after the source copy. Uses --extra all (the composite extra intended for production) rather than --all-extras (would pull in [rl], [yc-bench], [termux-all] — git-cloned RL libs, benchmarks, Android redundancy that don't belong in the published image).
.github/workflows/uv-lockfile-check.yml — new blocking CI check that runs uv lock --check on PRs touching pyproject.toml / uv.lock. Since the Docker build now uses uv sync --frozen, a stale lockfile would fail the docker-publish workflow on main ~15 min into the build with no published image. This check catches that in ~10s at PR time, with a step summary telling the dev exactly which commands to run locally to fix it.
uv.lock — refreshed to match pyproject.toml (separate commit, pre-existing drift picked up by the new check).

How to Test

Verified via five manual workflow_dispatch runs on this branch (a temporary dispatch trigger + dryrun-<sha> tag scheme was used during development; both were dropped from the final history). All five runs succeeded end-to-end, produced a valid multi-arch manifest, and correctly skipped move-latest (workflow_dispatch can't touch :latest — triple-gated via event_name == 'push' + ref == 'refs/heads/main' + the pushed_sha_tag output which only gets set on push-to-main).

run	scenario	build-amd64	build-arm64	total wall
baseline (main, today)	single runner + QEMU			38-45 min
1	per-arch split, cold cache	12m 36s	11m 18s	~13 min
2	per-arch split, warm cache	5m 30s	7m 54s	~8m 20s
3	+ dockerfile layer, buggy `--all-extras`	18m 21s	13m 41s	~19 min ❌
4	+ dockerfile layer, `--extra all` fix, cold	7m 1s	16m 9s	~16m 30s
5	+ dockerfile layer, warm cache	2m 53s	26s 🚀	~3m 17s

Run 3 surfaced the --all-extras bloat bug — caught in dry-run before merge. Run 5 is the target steady state: on a source-only commit (no pyproject.toml change, cache populated), the whole pipeline finishes in ~3 minutes.

Post-merge verification steps:

Wait for the first real push to main that triggers this workflow. Confirm total wall clock is in the 12-18 min range on cold cache (new cache scopes will be empty at first).
After that lands, the next push with source-only changes should complete in <5 min.

Verify :latest points at the merge commit:

docker buildx imagetools inspect nousresearch/hermes-agent:latest \
  --format '{{ json (index .Image "linux/amd64") }}' \
  | jq -r '.config.Labels."org.opencontainers.image.revision"'

Pull and smoke-test both platforms on hosts that have them:

docker pull --platform linux/amd64 nousresearch/hermes-agent:latest
docker run --rm nousresearch/hermes-agent:latest hermes --help
docker pull --platform linux/arm64 nousresearch/hermes-agent:latest
docker run --rm nousresearch/hermes-agent:latest hermes --help

Test the new uv-lockfile-check job by opening a throwaway PR that adds a dep to pyproject.toml without regenerating uv.lock. The check should fail with a clear step summary.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass — N/A (CI-only change, no Python runtime code touched; the test suite doesn't exercise GitHub Actions workflows)
I've added tests for my changes (required for bug fixes, strongly encouraged for features) — N/A, but verified via 5 live dry-run workflow executions (see timing table above)
I've tested on my platform: GitHub-hosted ubuntu-latest + ubuntu-24.04-arm runners (verified via workflow_dispatch on this branch)

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — N/A, workflow and Dockerfile comments are thorough
I've updated cli-config.yaml.example if I added/changed config keys — N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A (no architectural change; CI workflow modification only)
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — N/A (CI-only change, targets GitHub's Linux runners; the published image already supported amd64 + arm64)
I've updated tool descriptions/schemas if I changed tool behavior — N/A

Screenshots / Logs

Dry-run workflow runs on this branch (workflow_dispatch trigger + dryrun-<sha> tag scheme dropped from final history):

run	description	link
1	cold, per-arch split only	https://github.com/NousResearch/hermes-agent/actions/runs/25575794699
2	warm, per-arch split only	https://github.com/NousResearch/hermes-agent/actions/runs/25576643168
3	+ dockerfile layer, buggy `--all-extras` (caught in dry-run)	https://github.com/NousResearch/hermes-agent/actions/runs/25577579491
4	+ dockerfile layer, `--extra all` fix, cold	https://github.com/NousResearch/hermes-agent/actions/runs/25578526011
5	+ dockerfile layer, `--extra all` fix, warm	https://github.com/NousResearch/hermes-agent/actions/runs/25579260593

Multi-arch manifest from run 4 (pre-squash dryrun tag, same schema production will produce):

$ skopeo inspect --raw docker://docker.io/nousresearch/hermes-agent:dryrun-1174fd4ff4a5bd022846f1c7ee6277221a7c2059
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "manifests": [
    { "digest": "sha256:883a7ea8...", "platform": { "architecture": "arm64", "os": "linux" } },
    { "digest": "sha256:5dbc97bc...", "platform": { "architecture": "unknown", "os": "unknown" },
      "annotations": { "vnd.docker.reference.type": "attestation-manifest" } },
    { "digest": "sha256:a95c86b7...", "platform": { "architecture": "amd64", "os": "linux" } },
    { "digest": "sha256:e80bfea6...", "platform": { "architecture": "unknown", "os": "unknown" },
      "annotations": { "vnd.docker.reference.type": "attestation-manifest" } }
  ]
}

Both linux/amd64 and linux/arm64 sub-manifests are present, plus SLSA build attestations for each.

Note: a handful of dryrun-<sha> tags exist on Docker Hub from the dry runs. They're immutable digest-addressed images, harmless to leave but safe to delete after merge if desired.

github-actions · 2026-05-08T21:17:17Z

🔎 Lint report: `fix/faster-docker` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 7822 on HEAD, 7822 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4121 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

Build amd64 and arm64 natively on their own GitHub runners in parallel, then stitch the per-arch digests into a tagged multi-arch manifest. Replaces the previous single-runner pattern which rebuilt arm64 from scratch on every run because QEMU emulation + unscoped GHA cache meant no layer reuse across invocations. Jobs: build-amd64 — ubuntu-latest, native, runs smoke tests, pushes by digest build-arm64 — ubuntu-24.04-arm, native (no QEMU), pushes by digest merge — stitches both digests into :sha-<sha> (main) or :<release> move-latest — unchanged ancestor-check logic, now needs: merge Preserved: - per-commit sha-<sha> tags on main (immutable, race-free) - org.opencontainers.image.revision label on each per-arch image - dashboard subcommand smoke test (#9153 guard) - race-safe :latest advancement via move-latest - top-level cancel-in-progress: false Changed behavior: - move-latest flipped to cancel-in-progress: false for defense-in-depth. Top-level concurrency already serializes runs for the ref, so the old cancel=true on move-latest was dead code. Flipping to false prevents any starvation mode if top-level is ever loosened. Cache scopes separated per-arch (scope=docker-amd64 / scope=docker-arm64) so the two runners don't clobber each other in the gha cache backend.

Before this change, `uv pip install -e ".[all]"` ran AFTER `COPY . .`, so every commit that changed any .py file busted the layer cache and re-did the entire Python dep resolve + wheel download + native extension compile (~4-5 min on cold Docker Hub cache). Split it into two steps: 1. Before `COPY . .`: copy only pyproject.toml + uv.lock + README.md, then `uv sync --frozen --no-install-project --all-extras`. This layer is cached unless any of those three files change, so .py-only commits skip the heavy work entirely. 2. After `COPY . .` (and its downstream chmod/chown step): run `uv pip install --no-cache-dir --no-deps -e .` to create the editable link. With --no-deps this is a ~1s op — no resolution, no downloads, no compilation. Combined with the per-arch runner split in the previous commit, this should drop cache-hit build times to the sub-5-min range.

Runs `uv lock --check` on every PR and on push to main that touches pyproject.toml, uv.lock, or this workflow itself. Exits non-zero if the lockfile is out of sync with pyproject.toml, blocking the PR before it can break the Docker build on main. Rationale: the new Dockerfile layout uses `uv sync --frozen --extra all`, which rejects stale lockfiles. Without this guard, a PR that changes pyproject.toml dependencies but forgets to regenerate uv.lock would merge fine and then break docker-publish on main (visible only after ~15 min of build time, producing no image). On failure, the step adds a GitHub annotation and a workflow summary block with the exact commands to run locally (`uv lock`, `git add uv.lock`, `git commit`). Verified locally that: - Clean tree: `uv lock --check` succeeds (resolves in ~2ms, no work). - Stale lockfile (added cowsay to pyproject.toml, not in lock): exits 1 with message 'The lockfile at `uv.lock` needs to be updated'.

Adds `pull_request` trigger to docker-publish.yml so PRs that touch Dockerfile / docker/ / pyproject.toml / uv.lock / the workflow itself verify the image builds cleanly before merge. Previously, Dockerfile regressions (e.g. a stale uv.lock, a typo'd dep) would only surface after merge when the docker-publish workflow ran on main. Build-verify-only on PRs: the per-arch jobs run their `load: true` build + smoke test, but the push-by-digest + artifact upload steps remain gated on push-to-main or release. The `merge` and `move-latest` jobs stay excluded from PRs by their existing `if:` gates, so :latest and SHA tags are never touched from PR runs. Concurrency: PR runs use a PR-scoped group (`docker-<pr_number>`) with `cancel-in-progress: true` so rapid pushes to the same PR collapse to the latest commit. Push/release runs keep `cancel-in-progress: false` — every merge still gets its own SHA-tagged image. Also adds arm64 smoke tests (previously amd64-only): the image is now built with `load: true` on arm64 too, then `docker run --help` + `dashboard --help` smoke tests run identically on both arches. Both smoke test blocks were extracted into a new composite action at `.github/actions/hermes-smoke-test` to keep the two jobs DRY. New files: - .github/actions/hermes-smoke-test/action.yml Modified: - .github/workflows/docker-publish.yml

…cker ci: split docker-publish per-arch runners + cache-friendly dockerfile layers

ethernet8023 added 5 commits May 8, 2026 18:46

fix(ci): update uv.lock

0a51863

ethernet8023 force-pushed the fix/faster-docker branch from 68b9c02 to 93679ef Compare May 8, 2026 22:47

alt-glitch added type/perf Performance improvement or optimization area/docker Docker image, Compose, packaging P3 Low — cosmetic, nice to have labels May 8, 2026

ethernet8023 merged commit d10d19e into main May 8, 2026
18 of 19 checks passed

ethernet8023 deleted the fix/faster-docker branch May 8, 2026 23:12

bot-ted mentioned this pull request May 9, 2026

chore: sync with upstream main (2026-05-09) bot-ted/hermes-agent#25

Merged

JinyuID pushed a commit to JinyuID/hermes-agent that referenced this pull request May 11, 2026

Merge pull request NousResearch#22080 from NousResearch/fix/faster-do…

fdb7f95

…cker ci: split docker-publish per-arch runners + cache-friendly dockerfile layers

daimon-nous Bot mentioned this pull request May 13, 2026

ci(docker): split :latest (releases only) from :main #25045

Merged

jsboige pushed a commit to jsboige/hermes-agent that referenced this pull request May 14, 2026

Merge pull request NousResearch#22080 from NousResearch/fix/faster-do…

c362b8a

…cker ci: split docker-publish per-arch runners + cache-friendly dockerfile layers

Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request May 25, 2026

Merge pull request NousResearch#22080 from NousResearch/fix/faster-do…

9afe094

…cker ci: split docker-publish per-arch runners + cache-friendly dockerfile layers

gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026

Merge pull request NousResearch#22080 from NousResearch/fix/faster-do…

a3d0efd

…cker ci: split docker-publish per-arch runners + cache-friendly dockerfile layers

Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026

Merge pull request NousResearch#22080 from NousResearch/fix/faster-do…

5c53969

…cker ci: split docker-publish per-arch runners + cache-friendly dockerfile layers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: split docker-publish per-arch runners + cache-friendly dockerfile layers#22080

ci: split docker-publish per-arch runners + cache-friendly dockerfile layers#22080
ethernet8023 merged 5 commits into
mainfrom
fix/faster-docker

ethernet8023 commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ethernet8023 commented May 8, 2026

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Screenshots / Logs

Uh oh!

github-actions Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 Lint report: fix/faster-docker vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 8, 2026 •

edited

Loading

🔎 Lint report: `fix/faster-docker` vs `origin/main`