ci: Major refactor of release-workflows#176
Conversation
Adopts the FW-CI-templates v1.0.0 pattern (NVIDIA-NeMo/FW-CI-templates#466): - single release.yaml caller for both push (validate-only) and workflow_dispatch (real release / dry-run) - no PyPI wheel publish (skip-wheel-build: true) — same pattern as RL - App-only auth (drops PAT/SSH_KEY/SSH_PWD) - pre-flight gate skips heavy work on deploy-release/* + docs_only - Slack webhook resolves at env scope (public for inert; main for real) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
Greptile SummaryThis PR refactors the CI release pipeline for Emerging-Optimizers, replacing the standalone
Confidence Score: 4/5The workflow logic for push and workflow_dispatch paths is sound, but several open review threads flag real defects in how user-provided boolean inputs are forwarded to the reusable workflow. The .github/workflows/release.yaml — specifically the boolean input forwarding expressions and the hardcoded publish-docs value. The undeleted .github/workflows/build-test-publish-wheel.yml also warrants attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A([Trigger]) --> B{Event type?}
B -->|push to main or r-branches| C[pre-flight job]
B -->|workflow_dispatch| D[pre-flight SKIPPED]
C --> E{docs_only or deployment?}
E -->|yes| F[release SKIPPED]
E -->|no| G[release job\nvalidate-only=true]
D --> H[release job\nvalidate-only=false]
F --> I[release-summary job]
G --> I
H --> I
I --> J{Failed jobs?}
J -->|none| K([exit 0])
J -->|found| L([exit 1])
Reviews (12): Last reviewed commit: "ci: pin FW-CI templates to v1.1.0" | Re-trigger Greptile |
| validate-only: ${{ github.event_name != 'workflow_dispatch' }} | ||
| dry-run: ${{ inputs.dry-run || false }} | ||
| version-bump-branch: ${{ inputs.version-bump-branch || github.ref_name }} | ||
| create-gh-release: ${{ inputs.create-gh-release || true }} |
There was a problem hiding this comment.
When
workflow_dispatch fires and the user explicitly sets create-gh-release: false, the expression inputs.create-gh-release || true evaluates to false || true = true, silently overriding the user's intent and always creating a GH release. This is the only boolean input that uses a truthy fallback (|| true); the others use || false which is safe.
| create-gh-release: ${{ inputs.create-gh-release || true }} | |
| create-gh-release: ${{ inputs.create-gh-release != '' && inputs.create-gh-release || true }} |
| if: | | ||
| ( | ||
| needs.pre-flight.outputs.docs_only == 'true' | ||
| || needs.pre-flight.outputs.is_deployment_workflow == 'true' | ||
| || always() | ||
| ) | ||
| && !cancelled() |
There was a problem hiding this comment.
always() placed inside the OR group short-circuits the entire parenthesised expression to true regardless of the docs_only and is_deployment_workflow checks — they are never evaluated. The effective condition is just !cancelled(). The dead checks add noise and may mislead future maintainers about when this job is supposed to be skipped.
| if: | | |
| ( | |
| needs.pre-flight.outputs.docs_only == 'true' | |
| || needs.pre-flight.outputs.is_deployment_workflow == 'true' | |
| || always() | |
| ) | |
| && !cancelled() | |
| if: always() && !cancelled() |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Lets env-scoped SLACK_WEBHOOK reach the notify job in the called workflow. Signed-off-by: oliver könig <okoenig@nvidia.com>
|
/ok to test eb723b5 |
Signed-off-by: oliver könig <okoenig@nvidia.com>
|
/ok to test 6a8f8e7 |
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
…!failure) Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
| release: | ||
| uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_release_library.yml@v0.57.0 | ||
| needs: [pre-flight] | ||
| if: | | ||
| !cancelled() && !failure() | ||
| && !(needs.pre-flight.outputs.docs_only == 'true' | ||
| || needs.pre-flight.outputs.is_deployment_workflow == 'true') | ||
| uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_release_library.yml@v1.0.0 |
There was a problem hiding this comment.
Missing deletion of
build-test-publish-wheel.yml
The PR description explicitly lists "Delete .github/workflows/build-test-publish-wheel.yml" as part of this change, but the file was not removed and remains active. It still triggers on push to main and r** branches — the same branches the new release.yaml now covers. After this PR lands, every push to main or an r** branch will fire both the old _build_test_publish_wheel.yml@v0.57.0 workflow and the new _release_library.yml@v1.0.0 workflow concurrently, resulting in duplicate release-pipeline runs. The BUILD_TEST_PUBLISH_WHEEL == 'true' guard reduces but does not eliminate the risk (the variable may be set in the repository).
Signed-off-by: oliver könig <okoenig@nvidia.com>
|
/ok to test 7c1e8d4 |
Signed-off-by: oliver könig <okoenig@nvidia.com>
Why
See the design discussion in NVIDIA-NeMo/FW-CI-templates#466.
What
.github/workflows/build-test-publish-wheel.yml..github/workflows/release.yamlas the single caller for bothpushandworkflow_dispatch.Test plan
workflow_dispatch dry-run=true(sha 66de54d, 2026-05-07T11:28:32Z, success): https://github.com/NVIDIA-NeMo/Emerging-Optimizers/actions/runs/25493037909workflow_dispatch dry-run=falseon the next planned RC.Rollout
v1.0.0.