Skip to content

ci: Major refactor of release-workflows#176

Merged
ko3n1g merged 13 commits into
mainfrom
ko3n1g/refactor/validate-only-release
May 11, 2026
Merged

ci: Major refactor of release-workflows#176
ko3n1g merged 13 commits into
mainfrom
ko3n1g/refactor/validate-only-release

Conversation

@ko3n1g

@ko3n1g ko3n1g commented May 7, 2026

Copy link
Copy Markdown
Contributor

Why

See the design discussion in NVIDIA-NeMo/FW-CI-templates#466.

What

  • Delete .github/workflows/build-test-publish-wheel.yml.
  • Rewrite .github/workflows/release.yaml as the single caller for both push and workflow_dispatch.

Test plan

Rollout

  1. Land FW-CI-templates#466.
  2. Cut FW-CI-templates v1.0.0.
  3. Bump the SHA pin in this PR → tag.

Adopts the FW-CI-templates v1.0.0 pattern (NVIDIA-NeMo/FW-CI-templates#466):
- single release.yaml caller for both push (validate-only) and
  workflow_dispatch (real release / dry-run)
- no PyPI wheel publish (skip-wheel-build: true) — same pattern as RL
- App-only auth (drops PAT/SSH_KEY/SSH_PWD)
- pre-flight gate skips heavy work on deploy-release/* + docs_only
- Slack webhook resolves at env scope (public for inert; main for real)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
@greptile-apps

greptile-apps Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR refactors the CI release pipeline for Emerging-Optimizers, replacing the standalone build-test-publish-wheel.yml with a unified release.yaml that handles both push-triggered validation and workflow_dispatch-triggered releases through a single caller workflow referencing _release_library.yml@v1.1.0.

  • New trigger model: push to main, r**, pull-request/**, and deploy-release/* branches runs a pre-flight check and validation-only release pass; workflow_dispatch skips pre-flight and runs the full release pipeline.
  • release-summary job: A new terminal job queries the GitHub API with gh run view to surface any failed jobs and fail the workflow run accordingly.
  • Open issues from existing review threads: create-gh-release: ${{ inputs.create-gh-release || true }} always evaluates to true regardless of user input; publish-docs: true is hardcoded and ignores the publish-docs workflow input; and build-test-publish-wheel.yml was not deleted despite being listed in the PR description, leaving the old workflow active on the same branch triggers.

Confidence Score: 4/5

The workflow logic for push and workflow_dispatch paths is sound, but several open review threads flag real defects in how user-provided boolean inputs are forwarded to the reusable workflow.

The create-gh-release: ${{ inputs.create-gh-release || true }} expression always produces true, making it impossible to suppress GH release creation via workflow_dispatch. Similarly, publish-docs is hardcoded to true and cannot be overridden by callers. Both inputs are surfaced to operators in the dispatch UI but silently ignored at runtime. Additionally, build-test-publish-wheel.yml was not removed, leaving the old workflow active on the same branch triggers and risking duplicate release-pipeline executions.

.github/workflows/release.yaml — specifically the boolean input forwarding expressions and the hardcoded publish-docs value. The undeleted .github/workflows/build-test-publish-wheel.yml also warrants attention.

Important Files Changed

Filename Overview
.github/workflows/release.yaml Complete rewrite of the release workflow to unify push-triggered validation and workflow_dispatch-triggered releases; contains open issues around boolean input overrides, hardcoded publish-docs, and a missing file deletion.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([Trigger]) --> B{Event type?}
    B -->|push to main or r-branches| C[pre-flight job]
    B -->|workflow_dispatch| D[pre-flight SKIPPED]

    C --> E{docs_only or deployment?}
    E -->|yes| F[release SKIPPED]
    E -->|no| G[release job\nvalidate-only=true]

    D --> H[release job\nvalidate-only=false]

    F --> I[release-summary job]
    G --> I
    H --> I

    I --> J{Failed jobs?}
    J -->|none| K([exit 0])
    J -->|found| L([exit 1])
Loading

Reviews (12): Last reviewed commit: "ci: pin FW-CI templates to v1.1.0" | Re-trigger Greptile

validate-only: ${{ github.event_name != 'workflow_dispatch' }}
dry-run: ${{ inputs.dry-run || false }}
version-bump-branch: ${{ inputs.version-bump-branch || github.ref_name }}
create-gh-release: ${{ inputs.create-gh-release || true }}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 When workflow_dispatch fires and the user explicitly sets create-gh-release: false, the expression inputs.create-gh-release || true evaluates to false || true = true, silently overriding the user's intent and always creating a GH release. This is the only boolean input that uses a truthy fallback (|| true); the others use || false which is safe.

Suggested change
create-gh-release: ${{ inputs.create-gh-release || true }}
create-gh-release: ${{ inputs.create-gh-release != '' && inputs.create-gh-release || true }}

Comment thread .github/workflows/release.yaml Outdated
Comment on lines +106 to +112
if: |
(
needs.pre-flight.outputs.docs_only == 'true'
|| needs.pre-flight.outputs.is_deployment_workflow == 'true'
|| always()
)
&& !cancelled()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 always() placed inside the OR group short-circuits the entire parenthesised expression to true regardless of the docs_only and is_deployment_workflow checks — they are never evaluated. The effective condition is just !cancelled(). The dead checks add noise and may mislead future maintainers about when this job is supposed to be skipped.

Suggested change
if: |
(
needs.pre-flight.outputs.docs_only == 'true'
|| needs.pre-flight.outputs.is_deployment_workflow == 'true'
|| always()
)
&& !cancelled()
if: always() && !cancelled()

@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown

Test Results

   50 files  ±0    110 suites  ±0   1m 47s ⏱️ +27s
1 031 tests ±0  1 031 ✅ ±0  0 💤 ±0  0 ❌ ±0 
2 313 runs  ±0  2 313 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit fc0e75b. ± Comparison against base commit 04ee932.

♻️ This comment has been updated with latest results.

@codecov

codecov Bot commented May 7, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Lets env-scoped SLACK_WEBHOOK reach the notify job in the called workflow.

Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g

ko3n1g commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test eb723b5

Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g

ko3n1g commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 6a8f8e7

@ko3n1g ko3n1g changed the title ci: consolidate PR + release workflows; use validate-only ci: Major refactor of release-workflows May 7, 2026
Signed-off-by: oliver könig <okoenig@nvidia.com>
chtruong814
chtruong814 previously approved these changes May 7, 2026
Signed-off-by: oliver könig <okoenig@nvidia.com>
…!failure)

Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Comment thread .github/workflows/release.yaml Outdated
Comment on lines +77 to +83
release:
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_release_library.yml@v0.57.0
needs: [pre-flight]
if: |
!cancelled() && !failure()
&& !(needs.pre-flight.outputs.docs_only == 'true'
|| needs.pre-flight.outputs.is_deployment_workflow == 'true')
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_release_library.yml@v1.0.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing deletion of build-test-publish-wheel.yml

The PR description explicitly lists "Delete .github/workflows/build-test-publish-wheel.yml" as part of this change, but the file was not removed and remains active. It still triggers on push to main and r** branches — the same branches the new release.yaml now covers. After this PR lands, every push to main or an r** branch will fire both the old _build_test_publish_wheel.yml@v0.57.0 workflow and the new _release_library.yml@v1.0.0 workflow concurrently, resulting in duplicate release-pipeline runs. The BUILD_TEST_PUBLISH_WHEEL == 'true' guard reduces but does not eliminate the risk (the variable may be set in the repository).

Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g

ko3n1g commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 7c1e8d4

Signed-off-by: oliver könig <okoenig@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants