feat: Major refactoring of the release-workflows#466
Merged
Conversation
Two structural changes: 1. _release_library.yml gains a `validate-only` input. When true, every side-effecting step is inert: no temp branch push, no PR creation, no status-check wait, no merge, no GH release POST, no docs publish, no Slack notify. Per-PR callers can now exercise the full release pipeline without TestPyPI accounts, AWS/Akamai secrets, or admin permissions. Closes the gap where release-only paths broke for the first time at release time. 2. _build_test_publish_wheel.yml drops `dry-run` and TestPyPI entirely. Metadata compliance (README rendering, classifiers, MANIFEST, wheel contents) now runs in a new `validate-wheel` job via twine check, check-wheel-contents, and check-manifest — locally, no upload. The random-patch hack that existed only for TestPyPI uniqueness is gone. `publish-wheel` always targets real PyPI when `no-publish=false`. Other touches: - Delete `build-test-publish-wheel-dry-run` job — replaced by per-PR validate-only runs. - Inner `_build_test_publish_wheel.yml` reference switched from a pin to `./.github/workflows/_build_test_publish_wheel.yml` so the two files always travel together. - All previously-required release-time secrets are now `required: false` on `_release_library.yml` so PR-time callers from forks don't fail parse-time validation. BREAKING CHANGE: `_build_test_publish_wheel.yml` no longer accepts the `dry-run` input. Callers that passed `dry-run: true` should remove the line. Callers that need to skip publishing should use `no-publish: true`. Signed-off-by: oliver könig <okoenig@nvidia.com>
3 tasks
…ate-only - check-wheel-contents now ignores W002 (same-content files — empty __init__.py false-positive) and W004 (empty files). The remaining warnings still catch bytecode, oversized wheels, and missing modules. - build-docs runs when validate-only=true even if publish-docs=false, so PR rehearsals exercise the docs build path. publish-docs job is still gated on validate-only=false. - Inner _build_test_publish_wheel.yml gains a suppress-failure-notify input. _release_library.yml passes validate-only into it so PR rehearsals don't post Slack alerts on transient build failures. - build-docs `ref` uses the release-ref directly under validate-only (no tag exists yet) and the tag under real release. Signed-off-by: oliver könig <okoenig@nvidia.com>
When neither the changelog builder ran nor an RC version triggers the prerelease text, the script previously called awk on CHANGELOG.md unconditionally — which fails repos that don't keep a CHANGELOG.md. Guard the read with `[[ -f CHANGELOG.md ]]` and fall back to a one-line placeholder so validate-only rehearsals (and any release path that opts out of changelog-builder) succeed regardless. Signed-off-by: oliver könig <okoenig@nvidia.com>
Under dry-run=true, create-gh-release echoes the curl payload instead of POSTing — so the vX.Y.Z tag never exists. The previous build-docs ref calculation only treated validate-only as inert; dry-run runs would then fail at git checkout v0.5.0 (no such ref). Treat both validate-only and dry-run as inert, and check out release-ref directly. Signed-off-by: oliver könig <okoenig@nvidia.com>
…flows
Decompose the monolithic _release_library.yml into three composable
reusable workflows so multi-wheel consumers (e.g. NVIDIA/Megatron-LM,
which publishes both megatron-core and megatron-fsdp from one repo with
a custom multi-arch matrix) can plug in their own wheel build between
the bump phase and the finalize phase.
New:
_release_bump.yml — admin gate + bump-next-version. Multi-target
capable via a JSON `bump-targets` input.
Outputs `release-version`.
_release_finalize.yml — create-gh-release + build-docs + publish-docs
+ notify. Takes `release-version` as input.
Refactored:
_release_library.yml — now a thin wrapper composing
bump → _build_test_publish_wheel → finalize.
External interface (inputs/secrets) preserved
— single-wheel consumers (Megatron-Bridge,
NeMo, NeMo-RL, NeMo-Curator) require no
change.
Multi-wheel consumers compose at the consumer level:
bump → uses _release_bump.yml
wheels → consumer-local matrix
finalize → uses _release_finalize.yml (gets release-version from bump)
Signed-off-by: oliver könig <okoenig@nvidia.com>
3 tasks
The conditional with format() produced a literal '\\n' in the rendered message instead of an actual newline, mangling the Slack output. pypi-name is always populated for canonical consumers, so the conditional is unnecessary. Signed-off-by: oliver könig <okoenig@nvidia.com>
MLM doesn't have a GitHub App configured — it uses a PAT for git push + PR creation. Make app-id optional (default ''); when empty, skip the app-token step and GPG import, use secrets.PAT for both checkout token and gh CLI auth, configure a github-actions[bot] identity in-place, and emit an unsigned (-s only) commit. Signed-off-by: oliver könig <okoenig@nvidia.com>
This was referenced May 5, 2026
Signed-off-by: oliver könig <okoenig@nvidia.com>
04a3b77 to
5781799
Compare
setuptools projects without a tracked MANIFEST.in can't be validated by check-manifest — it exits non-zero with 'missing from sdist'. Skip with a notice instead of failing the wheel validation step. Signed-off-by: oliver könig <okoenig@nvidia.com>
Single-wheel consumers without a GitHub App configured (Curator, RL) were getting 'startup_failure' when calling the wrapper because app-id was marked required. Make it optional with an empty default and forward to _release_bump.yml's PAT-fallback path. Signed-off-by: oliver könig <okoenig@nvidia.com>
App-less consumers (Curator, RL) rely on the bump workflow's PAT-fallback auth path; the wrapper wasn't passing PAT through, so gh pr create errored with 'set the GH_TOKEN environment variable'. Signed-off-by: oliver könig <okoenig@nvidia.com>
Pre-existing MANIFEST.in drift in consumer repos (cursor rules, .claude/ skills, etc.) shouldn't block the orchestration. Continue past failures and emit a warning instead. Signed-off-by: oliver könig <okoenig@nvidia.com>
For consumers whose wheel build requires infrastructure that's not available in the shared orchestration (CUDA-only deps, custom indices), skip the wheel phase entirely. Bump and finalize still run, so the release-version + GH-release path is exercised. Signed-off-by: oliver könig <okoenig@nvidia.com>
Failing fast surfaces real MANIFEST.in drift instead of papering over it. Consumers fix their manifests. Signed-off-by: oliver könig <okoenig@nvidia.com>
Repos with branch policies on the main env block pull-request/* mirror branches from deploying to main. validate-only doesn't actually need main-env protection (no secrets used, no side effects), so route it to the public env which is unrestricted by convention. Signed-off-by: oliver könig <okoenig@nvidia.com>
Repos with monorepo layouts or no root pyproject.toml fail _build_docs.yml because it expects to install the package from the workspace root. Tighten the gate so docs build only fires when the consumer actually publishes docs in real release; PR rehearsal is gated by the consumer setting publish-docs=true. Signed-off-by: oliver könig <okoenig@nvidia.com>
chtruong814
reviewed
May 5, 2026
| default: "" | ||
| description: | | ||
| GitHub App ID used to mint the bot token. Empty falls back to PAT | ||
| (must be passed as the PAT secret) for git push and PR creation. |
Contributor
There was a problem hiding this comment.
Does the app path work? If so, if we're building a new workflow, let's not allow the fall back?
Contributor
Author
There was a problem hiding this comment.
okay, app is now mandatory for all - including MLM
chtruong814
reviewed
May 5, 2026
| BUMPED_FILES+=("$PACKAGE_INFO_FILE") | ||
|
|
||
| TARGET_NEXT="$MAJOR.$MINOR.$NEXT_PATCH$NEXT_PRERELEASE" | ||
| elif [[ "$PACKAGING" == "hatch" ]]; then |
Contributor
There was a problem hiding this comment.
we're not supporting hatch anymore right? Or nemo run still needs to be updated?
Contributor
Author
There was a problem hiding this comment.
that was a glitch, thanks for finding it
Run was the only nominal hatch consumer and it actually uses setuptools- dynamic versioning. No remaining consumer uses hatch. Remove the unsupported branch from _release_bump.yml's bump logic and update input descriptions to list only setuptools and uv. Signed-off-by: oliver könig <okoenig@nvidia.com>
Hybrid PAT/App auth doubled the surface area in _release_bump.yml. Make app-id and BOT_KEY required, always sign commits with GPG, and drop the PAT fallback path. Consumers must set vars.BOT_ID and secrets.BOT_KEY (org-level vars cover most NeMo repos already). Signed-off-by: oliver könig <okoenig@nvidia.com>
This was referenced May 6, 2026
Why: NeMo (and similar large repos) intentionally exclude dev tooling (.claude/, .codex/, docs/source/, docker/, etc.) from sdist via setuptools defaults. check-manifest then flags 1000+ 'missing from sdist' false positives. Surface a skip flag rather than forcing every repo to maintain a [tool.check-manifest] ignore block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
Why: NeMo's Sphinx conf.py lives at docs/source/conf.py (not docs/), and large docs trees may have unfixable Sphinx warnings. Surface both knobs through _release_library.yml and _release_finalize.yml into _build_docs.yml so consumers can override the defaults. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
Why: Sphinx linkcheck flags external 403/404 links as broken; without a curated broken_links_false_positives.json it fails the build. Surface a skip flag for repos that haven't curated their false-positives list yet (e.g. NeMo, Gym). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
Why: PR mode only matches PRs whose merge_commit_sha is in the release-branch commit list. For release branches built via cherry-pick, cp commits have different SHAs from the original PR merge commits — the action matches zero PRs and the changelog shows only the bump line. HYBRID mode also lists raw commits as fallback so the cp messages render. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
3 tasks
Why: the bump commit is a 1-line pyproject.toml edit that triggers the consumer's full cicd-main test suite, then the wait-for-status-checks step polls until everything green. That's wasteful (~10–30 min of unrelated CI per release) and brittle (any flaky cicd-main test fails the release). GitHub Actions natively respects [skip ci] in commit messages — no workflow fires on the bump commit. The wait loop now detects the marker on the bump SHA and short-circuits immediately. Trade-off: non-Actions integrations (Codecov push hook, etc.) ignore the marker; for our consumers that's all GH Actions, so fine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
Why: previously environment was tied to validate-only only. That meant workflow_dispatch dry-run=true used the main environment (secrets/policies hardened for real release) when posting to Slack, even though no real publish happened. Switch to inert = validate-only OR dry-run so only true real-release posts hit the main environment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
…nv scope Why: previous wiring forwarded a single repo-level SLACK_WEBHOOK secret as a workflow_call input down the chain. That meant the finalize notify job always saw the same webhook regardless of its environment (public vs main). Drop the workflow_call passthrough so secrets.SLACK_WEBHOOK resolves at the job's environment scope — consumers configure SLACK_WEBHOOK separately per environment (public webhook for inert runs, main webhook for real release). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
3 tasks
Without inherit, the env tag on the called job cannot reach the env-scoped SLACK_WEBHOOK at the consumer's repo. Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
…rules)" This reverts commit fc25269. Signed-off-by: oliver könig <okoenig@nvidia.com>
ko3n1g
added a commit
to NVIDIA-NeMo/Automodel
that referenced
this pull request
May 11, 2026
* [ci] refactor: consolidate per-PR + release workflows; use validate-only mode See NVIDIA-NeMo/FW-CI-templates#466 for design discussion. - Delete build-test-publish-wheel.yml. - Rewrite release.{yml,yaml} as the single caller for both push and workflow_dispatch. validate-only derives from the trigger. - One pin to FW-CI-templates governs PR rehearsal and real release. Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to slack-notify gate Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to step-level webhook gate Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to manifest-skip + app-id optional Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to PAT-forwarding fix Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to skip-wheel-build + advisory check-manifest Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to strict check-manifest Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to env-conditional bump (validate-only -> public env) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin + default publish-docs=true for push-triggered rehearsal Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: default publish-docs=true to rehearse build-docs on PR push Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin (drop hatch support) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin (drop PAT support) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: drop PAT secret (now unused after FW-CI App-only refactor) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin (GPG optional) Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin; drop SSH_KEY/SSH_PWD Why: FW-CI-templates dropped GPG signing; SSH_KEY/SSH_PWD secrets no longer needed by the release pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to ee3b849 for wheel-content-ignore Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to a129c51 (notify Slack-link fix) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to aacccb4 (publish-wheel always runs) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 5cca628 (notify/publish-docs/admin-check always on) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to a092192 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 6afcae2 (build-docs root-dir) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 2a43619 (docs-root-dir / docs-requirements-file) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 2f00056 (Slack only on dispatch) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci(changelog): bump FW-CI pin (HYBRID mode default) + add cp-title transformer Why: HYBRID mode renders raw commits when no PR matches by merge_commit_sha (helps release branches built via cherry-pick). The transformer cleans up cp titles to show the inner PR title only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to b57ebf9 ([skip ci] on bump commit) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to cb5e93b (notify env public for dry-run) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 43d259e + drop SLACK_WEBHOOK passthrough Why: SLACK_WEBHOOK now resolves at the env scope (public/main) so the env-scoped secret value is used. No longer pass it as a workflow_call secret. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: bump FW-CI pin to d2f3dd3 + use secrets inherit Lets env-scoped SLACK_WEBHOOK reach the notify job in the called workflow. Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: strip orphan secret keys after secrets inherit Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: bump FW-CI pin to 64293f6 (slack render fix) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: pin FW-CI templates to v1.0.0 Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: pragma allowlist secret on 'secrets: inherit' lines Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: simplify release-summary if-block (always() was dead code) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: simplify release if-block (replace explicit success/skipped with !failure) Signed-off-by: oliver könig <okoenig@nvidia.com> --------- Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ko3n1g
added a commit
to NVIDIA-NeMo/Run
that referenced
this pull request
May 11, 2026
* [ci] refactor: consolidate per-PR + release workflows; use validate-only mode See NVIDIA-NeMo/FW-CI-templates#466 for design discussion. - Delete build-test-publish-wheel.yml. - Rewrite release.{yml,yaml} as the single caller for both push and workflow_dispatch. validate-only derives from the trigger. - One pin to FW-CI-templates governs PR rehearsal and real release. Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] fix: bump FW-CI pin to slack-notify gate; correct packaging to setuptools Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to step-level webhook gate Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to manifest-skip + app-id optional Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to PAT-forwarding fix Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to skip-wheel-build + advisory check-manifest Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to strict check-manifest Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] fix: drop pull_request trigger; copy-pr-bot pushes to pull-request/<n> instead Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to env-conditional bump (validate-only -> public env) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin + default publish-docs=true for push-triggered rehearsal Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: default publish-docs=true to rehearse build-docs on PR push Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin (drop hatch support) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin (drop PAT support) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: drop PAT secret (now unused after FW-CI App-only refactor) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin (GPG optional) Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin; drop SSH_KEY/SSH_PWD Why: FW-CI-templates dropped GPG signing; SSH_KEY/SSH_PWD secrets no longer needed by the release pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to ee3b849 for wheel-content-ignore Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to a129c51 (notify Slack-link fix) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to aacccb4 (publish-wheel always runs) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 5cca628 (notify/publish-docs/admin-check always on) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to a092192 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci(release): trigger release on pull-request/** + deploy-release/* Why: aligns with MBridge/Automodel/Curator/ExD/Eval/MLM — push pattern now covers copy-pr-bot mirror branches so the validate-only release rehearsal fires at PR time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci(release): use pull_request trigger for validate-only path Why: Run has no copy-pr-bot mirror, so push to ko3n1g/* never matched on:push.branches. The pull_request trigger fires directly on each PR push without needing a mirror, giving validate-only coverage at PR time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 6afcae2 (build-docs root-dir) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 2a43619 (docs-root-dir / docs-requirements-file) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 2f00056 (Slack only on dispatch) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci(changelog): bump FW-CI pin (HYBRID mode default) + add cp-title transformer Why: HYBRID mode renders raw commits when no PR matches by merge_commit_sha (helps release branches built via cherry-pick). The transformer cleans up cp titles to show the inner PR title only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to b57ebf9 ([skip ci] on bump commit) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to cb5e93b (notify env public for dry-run) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 43d259e + drop SLACK_WEBHOOK passthrough Why: SLACK_WEBHOOK now resolves at the env scope (public/main) so the env-scoped secret value is used. No longer pass it as a workflow_call secret. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: bump FW-CI pin to d2f3dd3 + use secrets inherit Lets env-scoped SLACK_WEBHOOK reach the notify job in the called workflow. Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: strip orphan secret keys after secrets inherit Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: bump FW-CI pin to 64293f6 (slack render fix) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: pin FW-CI templates to v1.0.0 Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: pragma allowlist secret on 'secrets: inherit' lines Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: simplify release-summary if-block (always() was dead code) Signed-off-by: oliver könig <okoenig@nvidia.com> --------- Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ko3n1g
added a commit
to NVIDIA-NeMo/Emerging-Optimizers
that referenced
this pull request
May 11, 2026
* ci: consolidate PR + release workflows; use validate-only Adopts the FW-CI-templates v1.0.0 pattern (NVIDIA-NeMo/FW-CI-templates#466): - single release.yaml caller for both push (validate-only) and workflow_dispatch (real release / dry-run) - no PyPI wheel publish (skip-wheel-build: true) — same pattern as RL - App-only auth (drops PAT/SSH_KEY/SSH_PWD) - pre-flight gate skips heavy work on deploy-release/* + docs_only - Slack webhook resolves at env scope (public for inert; main for real) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: bump FW-CI pin to d2f3dd3 + use secrets inherit Lets env-scoped SLACK_WEBHOOK reach the notify job in the called workflow. Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: enable docs publish for EO Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: docs-fail-on-warning false for EO (43 known warnings) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: build docs on PR push (validate-only) for EO Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: bump FW-CI pin to 64293f6 (slack render fix) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: pin FW-CI templates to v1.0.0 Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: pragma allowlist secret on 'secrets: inherit' lines Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: simplify release-summary if-block (always() was dead code) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: simplify release if-block (replace explicit success/skipped with !failure) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: drop docs-fail-on-warning: false (treat warnings as errors) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: test FW-CI@2ba61fb + docs-sync-all (verify before tagging) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: pin FW-CI templates to v1.1.0 Signed-off-by: oliver könig <okoenig@nvidia.com> --------- Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ko3n1g
added a commit
to NVIDIA-NeMo/NeMo
that referenced
this pull request
May 11, 2026
* ci: consolidate PR + release workflows; use validate-only Adopts the FW-CI-templates v1.0.0 pattern (NVIDIA-NeMo/FW-CI-templates#466): - single release.yml caller for both push (validate-only) and workflow_dispatch (real release / dry-run) - one pin governs PR rehearsal and shipped release - drops PAT/SSH_KEY/SSH_PWD secrets (App-only auth) - bumps FW-CI pin from v0.80.3 to current SHA Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci(release): silence check-wheel-contents W005+W009 (common+multi toplevel) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to f5224c1; skip-check-manifest Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci(release): add pre-flight gate to skip on deploy-release/* + docs-only Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci(release): enable docs publish in release pipeline Why: previous workflow defaulted publish-docs to false. Validate-only runs skipped build-docs/publish-docs entirely, so the docs path was never exercised at PR time. Default to true and surface docs-target-path / publish-as-latest / run-on-version-tag-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci(release): wire docs-directory=docs/source + docs-requirements-file Why: NeMo's Sphinx conf.py lives at docs/source/conf.py (not docs/); docs deps install via pip, not uv (requirements/requirements_docs.txt); fail-on-warning disabled because the existing docs tree has known warnings that can't all be resolved at once. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci(release): bump pin + docs-skip-linkcheck Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci(changelog): bump FW-CI pin (HYBRID mode default) + add cp-title transformer Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to b57ebf9 ([skip ci] on bump commit) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to cb5e93b Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 43d259e + drop SLACK_WEBHOOK passthrough Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: bump FW-CI pin to d2f3dd3 + use secrets inherit Lets env-scoped SLACK_WEBHOOK reach the notify job in the called workflow. Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: strip orphan secret keys after secrets inherit Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: bump FW-CI pin to 64293f6 (slack render fix) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: pin FW-CI templates to v1.0.0 Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: pragma allowlist secret on 'secrets: inherit' lines Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: simplify release-summary if-block (always() was dead code) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: simplify release if-block (replace explicit success/skipped with !failure) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: drop docs-fail-on-warning: false (treat warnings as errors) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: pass sync-all and no-extras to docs build Signed-off-by: oliver könig <okoenig@nvidia.com> * Revert "ci: pass sync-all and no-extras to docs build" This reverts commit 1a75e0b. _release_library.yml@v1.0.0 does not expose sync-all/no-extras inputs. Forwarding is added in NVIDIA-NeMo/FW-CI-templates#473; once that lands and gets tagged, bump the pin here and re-add the inputs as docs-sync-all / docs-no-extras. Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: bump _release_library to v1.2.0 and pass docs-sync-all / docs-no-extras v1.2.0 of NVIDIA-NeMo/FW-CI-templates#473 forwards docs-no-extras (and the existing docs-sync-all) to _build_docs.yml, so the release path can now match build-docs.yml (sync all groups, skip the cu12 extra). Signed-off-by: oliver könig <okoenig@nvidia.com> --------- Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ko3n1g
added a commit
to NVIDIA-NeMo/Gym
that referenced
this pull request
May 11, 2026
## Why See the design discussion in NVIDIA-NeMo/FW-CI-templates#466. ## What - **Delete** `.github/workflows/build-test-publish-wheel.yml`. - **Rewrite** `.github/workflows/release.yaml` as the single caller for both `push` and `workflow_dispatch`. ## Test plan - [x] PR validate-only (push (mirror), sha 13efd60, 2026-05-07T11:27:47Z, success): https://github.com/NVIDIA-NeMo/Gym/actions/runs/25493005076 - [x] `workflow_dispatch dry-run=true` (sha 13efd60, 2026-05-07T11:28:53Z, success): https://github.com/NVIDIA-NeMo/Gym/actions/runs/25493052995 - [ ] Real release via `workflow_dispatch dry-run=false` on the next planned RC. ## Rollout 1. Land FW-CI-templates#466. 2. Cut FW-CI-templates `v1.0.0`. 3. Bump the SHA pin in this PR → tag. --------- Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chtruong814
added a commit
to NVIDIA-NeMo/Curator
that referenced
this pull request
May 14, 2026
* [ci] refactor: consolidate per-PR + release workflows; use validate-only mode See NVIDIA-NeMo/FW-CI-templates#466 for design discussion. - Delete build-test-publish-wheel.yml. - Rewrite release.{yml,yaml} as the single caller for both push and workflow_dispatch. validate-only derives from the trigger. - One pin to FW-CI-templates governs PR rehearsal and real release. Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to slack-notify gate Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to step-level webhook gate Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to manifest-skip + app-id optional Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to PAT-forwarding fix Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] fix: bump python-version to 3.12 for Curator (requires-python>=3.11) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to skip-wheel-build + advisory check-manifest Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: configure check-manifest to ignore dev-only paths Establishes the canonical set of paths excluded from the sdist so the shared release pipeline's check-manifest step passes deterministically. Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to strict check-manifest Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: comprehensive check-manifest ignore for non-package paths Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] fix: include data files (CSV/JSON/etc) under nemo_curator in sdist check-manifest surfaced that nemo_curator/utils/code_meta.csv was tracked in git but missing from the sdist. Without this, the wheel ships broken for users that hit code_meta.csv at runtime. Add a recursive-include rule to MANIFEST.in covering common data extensions. Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to env-conditional bump (validate-only -> public env) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to build-docs gate fix Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin (drop hatch support) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] feat: switch to GitHub App auth (drop PAT) + bump FW-CI pin Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin (GPG optional) Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin; drop SSH_KEY/SSH_PWD Why: FW-CI-templates dropped GPG signing; SSH_KEY/SSH_PWD secrets no longer needed by the release pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to ee3b849 for wheel-content-ignore Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to a129c51 (notify Slack-link fix) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to aacccb4 (publish-wheel always runs) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 5cca628 (notify/publish-docs/admin-check always on) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to a092192 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci(release): trigger release.yml on pull-request/** + deploy-release/* Why: aligns with MBridge/Automodel/ExD/Eval/MLM — push pattern now covers the copy-pr-bot mirror branches, so the validate-only release rehearsal fires at PR time instead of only on workflow_dispatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 6afcae2 (build-docs root-dir) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 2a43619 (docs-root-dir / docs-requirements-file) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 2f00056 (Slack only on dispatch) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci(changelog): bump FW-CI pin (HYBRID mode default) + add cp-title transformer Why: HYBRID mode renders raw commits when no PR matches by merge_commit_sha (helps release branches built via cherry-pick). The transformer cleans up cp titles to show the inner PR title only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to b57ebf9 ([skip ci] on bump commit) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to cb5e93b (notify env public for dry-run) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 43d259e + drop SLACK_WEBHOOK passthrough Why: SLACK_WEBHOOK now resolves at the env scope (public/main) so the env-scoped secret value is used. No longer pass it as a workflow_call secret. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: bump FW-CI pin to d2f3dd3 + use secrets inherit Lets env-scoped SLACK_WEBHOOK reach the notify job in the called workflow. Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: bump FW-CI pin to 64293f6 (slack render fix) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: enable docs build (rehearsal) on PR + dispatch for Curator Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: docs-requirements-file for Curator (no docs group) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: docs-fail-on-warning false for Curator Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: pin FW-CI templates to v1.0.0 Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: pragma allowlist secret on 'secrets: inherit' lines Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: simplify release-summary if-block (always() was dead code) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: drop docs-fail-on-warning: false (treat warnings as errors) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: docs-fail-on-warning false for Curator (matches build-docs.yml) Signed-off-by: oliver könig <okoenig@nvidia.com> * Do not publish docs by default Signed-off-by: Charlie Truong <chtruong@nvidia.com> --------- Signed-off-by: oliver könig <okoenig@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com>
ko3n1g
added a commit
to NVIDIA-NeMo/Evaluator
that referenced
this pull request
May 22, 2026
## Why See the design discussion in NVIDIA-NeMo/FW-CI-templates#466. ## What - **Delete** `.github/workflows/build-test-publish-wheel.yml`. - **Rewrite** `.github/workflows/release.yaml` as the single caller for both `push` and `workflow_dispatch`. ## Test plan - [ ] PR validate-only (push (mirror), sha 7669129, 2026-05-07T11:27:40Z, success): https://github.com/NVIDIA-NeMo/Evaluator/actions/runs/25493000087 - [ ] `workflow_dispatch dry-run=true` (sha 7669129, 2026-05-07T11:28:40Z, success): https://github.com/NVIDIA-NeMo/Evaluator/actions/runs/25493044119 - [ ] Real release via `workflow_dispatch dry-run=false` on the next planned RC. ## Rollout 1. Land FW-CI-templates#466. 2. Cut FW-CI-templates `v1.0.0`. 3. Bump the SHA pin in this PR → tag. --------- Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kashif
pushed a commit
to kashif/Automodel
that referenced
this pull request
May 25, 2026
* [ci] refactor: consolidate per-PR + release workflows; use validate-only mode See NVIDIA-NeMo/FW-CI-templates#466 for design discussion. - Delete build-test-publish-wheel.yml. - Rewrite release.{yml,yaml} as the single caller for both push and workflow_dispatch. validate-only derives from the trigger. - One pin to FW-CI-templates governs PR rehearsal and real release. Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to slack-notify gate Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to step-level webhook gate Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to manifest-skip + app-id optional Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to PAT-forwarding fix Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to skip-wheel-build + advisory check-manifest Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to strict check-manifest Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to env-conditional bump (validate-only -> public env) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin + default publish-docs=true for push-triggered rehearsal Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: default publish-docs=true to rehearse build-docs on PR push Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin (drop hatch support) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin (drop PAT support) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: drop PAT secret (now unused after FW-CI App-only refactor) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin (GPG optional) Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin; drop SSH_KEY/SSH_PWD Why: FW-CI-templates dropped GPG signing; SSH_KEY/SSH_PWD secrets no longer needed by the release pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to ee3b849 for wheel-content-ignore Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to a129c51 (notify Slack-link fix) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to aacccb4 (publish-wheel always runs) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 5cca628 (notify/publish-docs/admin-check always on) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to a092192 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 6afcae2 (build-docs root-dir) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 2a43619 (docs-root-dir / docs-requirements-file) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 2f00056 (Slack only on dispatch) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci(changelog): bump FW-CI pin (HYBRID mode default) + add cp-title transformer Why: HYBRID mode renders raw commits when no PR matches by merge_commit_sha (helps release branches built via cherry-pick). The transformer cleans up cp titles to show the inner PR title only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to b57ebf9 ([skip ci] on bump commit) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to cb5e93b (notify env public for dry-run) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 43d259e + drop SLACK_WEBHOOK passthrough Why: SLACK_WEBHOOK now resolves at the env scope (public/main) so the env-scoped secret value is used. No longer pass it as a workflow_call secret. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: bump FW-CI pin to d2f3dd3 + use secrets inherit Lets env-scoped SLACK_WEBHOOK reach the notify job in the called workflow. Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: strip orphan secret keys after secrets inherit Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: bump FW-CI pin to 64293f6 (slack render fix) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: pin FW-CI templates to v1.0.0 Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: pragma allowlist secret on 'secrets: inherit' lines Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: simplify release-summary if-block (always() was dead code) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: simplify release if-block (replace explicit success/skipped with !failure) Signed-off-by: oliver könig <okoenig@nvidia.com> --------- Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
shuoyangd
pushed a commit
to shuoyangd/NeMo-Curator
that referenced
this pull request
Jun 12, 2026
* [ci] refactor: consolidate per-PR + release workflows; use validate-only mode See NVIDIA-NeMo/FW-CI-templates#466 for design discussion. - Delete build-test-publish-wheel.yml. - Rewrite release.{yml,yaml} as the single caller for both push and workflow_dispatch. validate-only derives from the trigger. - One pin to FW-CI-templates governs PR rehearsal and real release. Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to slack-notify gate Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to step-level webhook gate Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to manifest-skip + app-id optional Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to PAT-forwarding fix Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] fix: bump python-version to 3.12 for Curator (requires-python>=3.11) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to skip-wheel-build + advisory check-manifest Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: configure check-manifest to ignore dev-only paths Establishes the canonical set of paths excluded from the sdist so the shared release pipeline's check-manifest step passes deterministically. Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to strict check-manifest Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: comprehensive check-manifest ignore for non-package paths Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] fix: include data files (CSV/JSON/etc) under nemo_curator in sdist check-manifest surfaced that nemo_curator/utils/code_meta.csv was tracked in git but missing from the sdist. Without this, the wheel ships broken for users that hit code_meta.csv at runtime. Add a recursive-include rule to MANIFEST.in covering common data extensions. Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to env-conditional bump (validate-only -> public env) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin to build-docs gate fix Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin (drop hatch support) Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] feat: switch to GitHub App auth (drop PAT) + bump FW-CI pin Signed-off-by: oliver könig <okoenig@nvidia.com> * [ci] chore: bump FW-CI pin (GPG optional) Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin; drop SSH_KEY/SSH_PWD Why: FW-CI-templates dropped GPG signing; SSH_KEY/SSH_PWD secrets no longer needed by the release pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to ee3b849 for wheel-content-ignore Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to a129c51 (notify Slack-link fix) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to aacccb4 (publish-wheel always runs) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 5cca628 (notify/publish-docs/admin-check always on) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to a092192 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci(release): trigger release.yml on pull-request/** + deploy-release/* Why: aligns with MBridge/Automodel/ExD/Eval/MLM — push pattern now covers the copy-pr-bot mirror branches, so the validate-only release rehearsal fires at PR time instead of only on workflow_dispatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 6afcae2 (build-docs root-dir) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 2a43619 (docs-root-dir / docs-requirements-file) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 2f00056 (Slack only on dispatch) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci(changelog): bump FW-CI pin (HYBRID mode default) + add cp-title transformer Why: HYBRID mode renders raw commits when no PR matches by merge_commit_sha (helps release branches built via cherry-pick). The transformer cleans up cp titles to show the inner PR title only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to b57ebf9 ([skip ci] on bump commit) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to cb5e93b (notify env public for dry-run) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * refactor(release): bump FW-CI pin to 43d259e + drop SLACK_WEBHOOK passthrough Why: SLACK_WEBHOOK now resolves at the env scope (public/main) so the env-scoped secret value is used. No longer pass it as a workflow_call secret. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: bump FW-CI pin to d2f3dd3 + use secrets inherit Lets env-scoped SLACK_WEBHOOK reach the notify job in the called workflow. Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: bump FW-CI pin to 64293f6 (slack render fix) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: enable docs build (rehearsal) on PR + dispatch for Curator Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: docs-requirements-file for Curator (no docs group) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: docs-fail-on-warning false for Curator Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: pin FW-CI templates to v1.0.0 Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: pragma allowlist secret on 'secrets: inherit' lines Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: simplify release-summary if-block (always() was dead code) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: drop docs-fail-on-warning: false (treat warnings as errors) Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: docs-fail-on-warning false for Curator (matches build-docs.yml) Signed-off-by: oliver könig <okoenig@nvidia.com> * Do not publish docs by default Signed-off-by: Charlie Truong <chtruong@nvidia.com> --------- Signed-off-by: oliver könig <okoenig@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Design issues addressed
validate-onlymode runs the full release pipeline inert (no PyPI, no PR push, no GH release POST, no docs publish, no Slack). PRs now rehearse what ships.dry-runuploaded to TestPyPI on every push to main / release branches. Quota fills up.twine check+check-wheel-contents+check-manifest.build-test-publish-wheel.ymlfor PRs,release.yamlfor releases) each pinned FW-CI-templates independently. The version tested on PR was rarely the version that shipped.validate-onlyis derived from the trigger; one pin governs both PR rehearsal and real release._release_library.ymlcouldn't accommodate this without forking ~500 lines of YAML._release_bump.yml(multi-target capable),_release_finalize.yml, plus the existing_build_test_publish_wheel.yml. Single-wheel consumers keep using_release_library.yml(now a thin wrapper). Multi-wheel consumers compose at the consumer level._release_bump.yml(PAT fallback, conditional GPG signing, hybrid commit identity) and obscured which identity the bump commits ran under._release_bump.ymlrequiresapp-id+BOT_KEY; auth is always the GitHub App bot token. Consumers must configurevars.BOT_IDandsecrets.BOT_KEY(org-levelBOT_IDalready covers most repos).SSH_KEY,SSH_PWD) that nothing else in the pipeline used. Setting them up correctly was a recurring onboarding tax with little payoff — the bump commit is a bot-authored mechanical edit, not a thing a human reviews for cryptographic provenance._release_bump.yml. The bump commit is now committed under thegithub-actions[bot]identity with-s(signoff) only; the App-token authorizes the push.SSH_KEYandSSH_PWDare no longer accepted as secrets; consumers should remove them from theirrelease.yamlsecrets:blocks at the next pin bump.Workflow composition
Single-wheel consumer (Megatron-Bridge, NeMo, NeMo-RL, NeMo-Curator):
flowchart LR T[push or workflow_dispatch] --> PF[pre-flight] PF --> W["_release_library.yml<br/>(wrapper)"] subgraph W [ ] direction LR B["_release_bump.yml"] --> WH["_build_test_publish_wheel.yml"] --> F["_release_finalize.yml"] end W --> S[release-summary]validate-onlyis set from the trigger (trueon push,falseon workflow_dispatch).External interface unchanged — existing consumers need no code change beyond the pin bump.
Multi-wheel consumer (Megatron-LM): two PyPI projects (
megatron-core,megatron_fsdp), three wheel cells (core arm64+amd64, fsdp amd64), each producing cp311+cp312+cp313 manylinux wheels. Composed at the consumer level:flowchart LR T[push or workflow_dispatch] --> PF[pre-flight push only] PF --> B["FW-CI-templates<br/>_release_bump.yml<br/>(multi-target)"] B --> WH["consumer-local<br/>_build_test_publish_wheel.yml<br/>(manylinux matrix)"] WH --> F["FW-CI-templates<br/>_release_finalize.yml"] F --> PD["consumer-local<br/>release-docs.yml"] PD --> S[release-summary]bump-targetsJSON input lets one bump commit cover both packages._release_finalize.ymltakesrelease-versionfrom the bump output, so the consumer can sandwich its own wheel and docs jobs between the two phases.Technical drilldown
_release_bump.yml(new)flowchart LR A["check-admin-permission<br/><i>advisory if validate-only</i>"] --> B[bump-next-version] B --> B1["compute<br/>release-version<br/>+ next-version"] B1 --> B2["push tmp branch<br/>+ open PR"] B2 --> B3[wait for status checks] B3 --> B4["merge into<br/>version-bump-branch"] B4 --> B5[delete tmp branch] B1 -.->|validate-only| Out[done — write to step summary] B4 -.->|validate-only OR dry-run| EchoMerge[echo push command]bump-targets: [{python-package, src-dir}, ...]. Single-target consumers keep usingpython-package+has-src-dir; that path falls through to a one-element array internally.actions/create-github-app-token, requiresapp-id+BOT_KEY). The bump commit is unsigned, authored bygithub-actions[bot], signed off with-s. The App token authorizes the push and the PR creation.release-version(current, what's being released) andnext-version(post-bump, used in PR title and summary).validate-onlyskips everything past the version compute (the branch-cycle is the only thing that distinguishes PR rehearsal fromworkflow_dispatch dry-run=true).dry-run(validate-only=false) still pushes the tmp branch + PR + status wait + cleanup, but echoes the final merge._release_finalize.yml(new)flowchart LR C[create-gh-release] --> D[build-docs] C --> N[notify] D --> P[publish-docs] C -.->|validate-only OR dry-run| EchoCurl[echo curl payload] D -.->|publish-docs=false| SkipD[skipped] P -.->|validate-only OR dry-run| EchoSync[aws s3 sync --dryrun] N -.->|validate-only| Summary[render to step summary]release-versionandpypi-namecome from upstream jobs (bump + wheel) — finalize is agnostic to how the wheel was built. This decoupling is what enables consumer-local wheel matrices.IS_INERT = validate-only OR dry-run.create-gh-releaseechoes the curl POST underIS_INERT.build-docsandpublish-docsalways run whenpublish-docs=true. The underlyingpublish-docsaction'sdry-runflag is wired toIS_INERT, soaws s3 syncis skipped on PR rehearsal. Consumers with monorepo layouts can passdocs-root-dir(where the docspyproject.tomllives) anddocs-requirements-file(pip-install style instead ofuv sync --only-group docs).workflow_dispatch(dry-run prefix whendry-run=true). Validate-only renders the assembled message to$GITHUB_STEP_SUMMARYso we still validate the build-step's shell logic without spamming the channel.required: false; the CHANGELOG.md read is guarded by[[ -f CHANGELOG.md ]]with a placeholder fallback._release_library.yml(refactored as wrapper)flowchart LR B[bump<br/>_release_bump.yml] --> W[build-test-publish-wheel<br/>_build_test_publish_wheel.yml] W --> F[finalize<br/>_release_finalize.yml] F --> Out["release-version<br/>= bump.outputs.release-version<br/>OR wheel.outputs.version"]./.github/workflows/...so they always travel together at the same FW-CI-templates ref.with:/secrets:blocks stay byte-identical.build-test-publish-wheel-dry-runjob (old TestPyPI smoke test) is replaced by per-PR validate-only — same code path, every push, instead of once per release.Breaking change
_build_test_publish_wheel.ymlno longer acceptsdry-run. To skip publishing, useno-publish: true.Companion PRs
megatron-core+megatron_fsdp)skip-wheel-build: true)pull_request:trigger used (no copy-pr-bot mirror)wheel-content-ignore: W009)nemo-evaluator+nemo-evaluator-launcher); monorepo (docs-root-dir: ".",docs-requirements-file)skip-wheel-build: true)Test plan
Rollout
v1.0.0(dropsdry-runfrom_build_test_publish_wheel.yml).