Skip to content

ci: build and publish validator images on merge to main#412

Merged
mchmarny merged 1 commit intoNVIDIA:mainfrom
yuanchen8911:ci/build-images-on-merge
Mar 17, 2026
Merged

ci: build and publish validator images on merge to main#412
mchmarny merged 1 commit intoNVIDIA:mainfrom
yuanchen8911:ci/build-images-on-merge

Conversation

@yuanchen8911
Copy link
Copy Markdown
Contributor

Summary

Add validator image builds to the on-push workflow so images stay testable from main without requiring a release tag.

Motivation / Context

Validator images are currently only published on tagged releases. This creates a gap between merging code and being able to test updated validators on a live cluster — e.g., #403 (GKE NCCL support) is merged but the performance validator image doesn't include it until a new release is cut.

Fixes: N/A
Related: #403, #387

Type of Change

  • Build/CI/tooling

Component(s) Affected

  • Other: .github/workflows/on-push.yaml

Implementation Notes

  • build-docker job: builds 3 validator images × 2 archs (6 parallel jobs), gated on push to main only (skipped for PRs)
  • docker-manifest job: creates multi-arch manifests with two tag types:
    • sha-<commit> — immutable per-commit tag for rollback and provenance
    • edge — mutable tag tracking latest main (not release-grade)
  • :latest is untouched — reserved for the on-tag release pipeline where vuln scan + attestation are enforced
  • Concurrency: PRs are cancelled on new pushes, but main merges always run to completion to guarantee immutable sha tags

Testing

# Workflow syntax validation
# No code changes — CI workflow only

Risk Assessment

  • Low — Isolated change, well-tested, easy to revert

Rollout notes: N/A — additive workflow change, no impact on existing on-tag release pipeline.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

Copy link
Copy Markdown
Member

@mchmarny mchmarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean CI change — job structure mirrors on-tag nicely, action pins are solid, and the edge/sha tagging strategy is well thought out. One inline comment on the concurrency side effect. LGTM otherwise.

@xdu31
Copy link
Copy Markdown
Contributor

xdu31 commented Mar 17, 2026

Reusable workflow opportunity. build-docker and docker-manifest are almost identical to on-tag — only the tag pattern and if guard differ. Worth extracting to a reusable workflow with inputs for tag strategy. Not a blocker, but reduces drift risk as these evolve.

@yuanchen8911 yuanchen8911 force-pushed the ci/build-images-on-merge branch from d53ead8 to 025ae65 Compare March 17, 2026 19:09
Add build-docker and docker-manifest jobs to the on-push workflow,
gated on push to main (skipped for PRs). This keeps validator images
testable from main without requiring a release tag.

Tags produced per merge:
- sha-<commit>: immutable multi-arch tag for rollback and provenance
- edge: mutable tag tracking latest main (not release-grade)

:latest is reserved for the on-tag release pipeline where vuln scan
and attestation are enforced.

Concurrency is split so PRs are cancelled on new pushes but main
merges always run to completion for immutable tag guarantees.

Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
@yuanchen8911 yuanchen8911 force-pushed the ci/build-images-on-merge branch from 025ae65 to ae635f4 Compare March 17, 2026 19:14
@yuanchen8911
Copy link
Copy Markdown
Contributor Author

Summary of offline discussion:

  • PR ci: build and publish validator images on merge to main #412: Ready to merge. Concurrency is configured as intended for this scope: PR runs cancel on new pushes; main push runs are not auto-canceled so each merge can complete image publication. No blocking correctness issue for the PR's stated goal (sha-<commit> + edge, no :latest changes).
  • Follow-up 1: Extract build-docker / docker-manifest into a reusable workflow shared with release.
  • Follow-up 2: Add per-PR image builds with if: always() cleanup (reuse NVS cleanup pattern).

@mchmarny mchmarny merged commit 9101d29 into NVIDIA:main Mar 17, 2026
16 checks passed
xdu31 pushed a commit to xdu31/aicr that referenced this pull request Mar 24, 2026
Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants