Skip to content

ci: Major refactor of release-workflows#4602

Merged
ko3n1g merged 42 commits into
NVIDIA:mainfrom
ko3n1g:ko3n1g/refactor/validate-only-release
May 11, 2026
Merged

ci: Major refactor of release-workflows#4602
ko3n1g merged 42 commits into
NVIDIA:mainfrom
ko3n1g:ko3n1g/refactor/validate-only-release

Conversation

@ko3n1g

@ko3n1g ko3n1g commented May 4, 2026

Copy link
Copy Markdown
Contributor

Why

See the design discussion in NVIDIA-NeMo/FW-CI-templates#466.

What

  • Delete .github/workflows/build-test-publish-wheel.yml.
  • Rewrite .github/workflows/release.yaml as the single caller for both push and workflow_dispatch.

Test plan

Rollout

  1. Land FW-CI-templates#466.
  2. Cut FW-CI-templates v1.0.0.
  3. Bump the SHA pin in this PR → tag.

Switch the release.yaml orchestration to use NVIDIA-NeMo/FW-CI-templates'
new composable workflows:
- _release_bump.yml — multi-target bump for both megatron-core and
  megatron_fsdp via a JSON `bump-targets` input.
- _release_finalize.yml — GH release + Slack notify, taking
  release-version as an input from the bump output.

Megatron-LM keeps:
- Its own _build_test_publish_wheel.yml (multi-arch matrix manylinux
  build for megatron-core arm64+amd64 and megatron-fsdp amd64), wired
  in as a sibling job between bump and finalize.
- Its own release-docs.yml (custom docs flow), invoked after finalize.

This removes the local _release_library.yml (now superseded) and aligns
release orchestration with the rest of the NeMo framework so future
upstream improvements (validate-only PR rehearsal, etc.) become
available without copying YAML.

Signed-off-by: oliver könig <okoenig@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented May 4, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ko3n1g

ko3n1g commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

Merge build-test-publish-wheel.yml (push-triggered, wheel-only) into
release.yaml (workflow_dispatch-triggered, full release) so a single
file with one FW-CI-templates pin governs both per-PR rehearsal and
real release. Mirrors the Megatron-Bridge consolidation.

On push (PR/main/deploy-release/merge_group): validate-only=true; the
full pipeline rehearses (bump computes only, wheels build without
publish, GH release payload echoed, docs publish skipped, Slack
suppressed).

On workflow_dispatch: validate-only=false; existing dry-run knob still
controls whether wheel publish + GH release POST + docs publish fire
or stay inert.

The push trigger now exercises the bump and finalize paths every PR,
not just the wheel build.

Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g

ko3n1g commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 729bb50

Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g

ko3n1g commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test ed1d2fe

When dispatching against a copy-pr-bot mirror branch (pull-request/<id>),
pre-flight's 'Get PR info' step matches startsWith(github.ref, 'refs/heads/pull-request/')
and tries to look up a PR — but the event_name is workflow_dispatch, not
pull_request, so the lookup fails. Skip pre-flight entirely on dispatch
events; downstream jobs already short-circuit their pre-flight-output
checks when github.event_name == 'workflow_dispatch'.

Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g

ko3n1g commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test b8919ca

@ko3n1g

ko3n1g commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test ed1d2fe

Default needs-success behavior would skip bump/wheels when pre-flight is
skipped. Switch their if: to !cancelled() && (pre-flight.result success
or skipped), keeping the auto-skip-on-failure semantics intact.

Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g

ko3n1g commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test f273553

Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g

ko3n1g commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 4292889

@ko3n1g

ko3n1g commented May 6, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 3fbbf8c

1 similar comment
@ko3n1g

ko3n1g commented May 6, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 3fbbf8c

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g

ko3n1g commented May 6, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test ee9b3f9

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g

ko3n1g commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test d5c90a9

1 similar comment
@ko3n1g

ko3n1g commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test d5c90a9

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g

ko3n1g commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test be26a40

ko3n1g and others added 3 commits May 7, 2026 08:44
…sthrough

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Lets env-scoped SLACK_WEBHOOK reach the notify job in the called workflow.

Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g

ko3n1g commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test a03fd03

Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g

ko3n1g commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test f54a6a1

Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g

ko3n1g commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 1717260

ko3n1g added 6 commits May 7, 2026 20:14
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
…!failure)

Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>

# Conflicts:
#	.github/workflows/build-test-publish-wheel.yml
@svcnvidia-nemo-ci

Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/25669939636

@svcnvidia-nemo-ci

Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/25671734703

@svcnvidia-nemo-ci

Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/25678362086

@svcnvidia-nemo-ci

Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/25685644892

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Approved All necessary approvals have been made complexity: high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants