Support GEMM + Swiglu fused MLP#3890
Conversation
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
|
/ok to test 86d4097 |
|
❌ Cherry-pick to main failed The cherry-pick encountered conflicts and could not be completed automatically. Next steps:
|
|
/claude review |
Signed-off-by: ksivamani <ksivamani@nvidia.com>
|
Can you add unit tests for the numerics of the fusion, and the remapping of parameter keys? |
|
@yaox12 The unit test for both the checkpoint loading as well as specifically the fusion numerics are included in NVIDIA/TransformerEngine#2769. I think adding them separately would be a duplicate. Unless you mean an e2e test? |
|
/ok to test 14f63ad |
|
❌ Cherry-pick to main failed The cherry-pick encountered conflicts and could not be completed automatically. Next steps:
|
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23320590254 |
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23324389643 |
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23325909832 |
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: ksivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: ksivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: ksivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: ksivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: ksivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: ksivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: ksivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: ksivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: ksivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: ksivamani <ksivamani@nvidia.com>
…r hunk The `_cached_param_buffer_shards_grad_enabled` field, its read site in `start_param_sync()`, and the `with torch.no_grad()` wrap around the coalescing manager all originated in NVIDIA#3890 on the dev branch. The dev sync merge `79aeecfe0` (Mar 25 2026) explicitly removed the read site and the no_grad wrap during conflict resolution when it pulled in the layerwise-optimizer code from main — only the field init survived as an orphan in `__init__`. The active logic was deliberately dropped, no regression was reported on dev or main in the intervening months, and zhongbozhu flagged this exact block on this PR (r3211212707) noting it was removed in dev. For a PR targeting main, resurrecting a hunk that was specifically dropped during a merge — without a fresh repro proving main needs it — is the wrong default. Remove all three pieces (the orphan init, the read site, the no_grad wrap) so this file matches main's shape except for the changes that are genuinely part of this PR's scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What does this PR do ?
This PR supports GEMM + Swiglu fused MLP via Transformer Engine sequential ops.
Contribution process
Pre-checks
Code review
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.
Step 1: Mark PR as "Ready for Review"
.github/CODEOWNERS.Final Review might get declined if these requirements are not fulfilled.
Step 2: Final Review
For PRs that change
megatron/core, once all expert reviewers have approved, theFinal Reviewlabel is applied automatically and final reviewers are assigned.For PRs outside
megatron/core, this step is skipped.Step 3: Approved
Once all required reviewers have approved, the
Approvedlabel is applied automatically.Merge
Any member of mcore-engineers will be able to merge your PR.
For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.