Skip to content

Refactor emerging optimizer integration#4113

Merged
skyw merged 11 commits into
NVIDIA:mainfrom
skyw:emerging-optimizer-refactor
Apr 3, 2026
Merged

Refactor emerging optimizer integration#4113
skyw merged 11 commits into
NVIDIA:mainfrom
skyw:emerging-optimizer-refactor

Conversation

@skyw

@skyw skyw commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

What does this PR do ?

Cherry-picks three related PRs from dev that refactor the optimizer infrastructure to generalize beyond Muon and support additional emerging optimizers (SOAP, Lion, etc.):

combined those instead cherry picking 1-by-1 because there are immediate changes among them that is not needed anymore.

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

Pre-checks

  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Code review

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.

Step 1: Mark PR as "Ready for Review"

  1. When your PR is ready, click Ready for Review.
  2. An oncall reviewer is auto-assigned and expert reviewers are notified based on your changes.
    • Some PRs may jump straight to step 2. This is determined by .github/CODEOWNERS.

⚠️ Only mark as ready once merge-conflicts are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

Step 2: Final Review

For PRs that change megatron/core, once all expert reviewers have approved, the Final Review label is applied automatically and final reviewers are assigned.

For PRs outside megatron/core, this step is skipped.

Step 3: Approved

Once all required reviewers have approved, the Approved label is applied automatically.

Merge

Any member of mcore-engineers will be able to merge your PR.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

@skyw skyw requested review from a team as code owners April 2, 2026 20:03
@copy-pr-bot

copy-pr-bot Bot commented Apr 2, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions

github-actions Bot commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

  1. Add the oncall reviewer (optional reviewer)
  2. Add required review teams based on your changes

See the contribution guide for more details.

@svcnvidia-nemo-ci svcnvidia-nemo-ci marked this pull request as draft April 2, 2026 20:03
@skyw skyw marked this pull request as ready for review April 2, 2026 20:09
@svcnvidia-nemo-ci svcnvidia-nemo-ci requested a review from a team April 2, 2026 20:09
@skyw skyw requested a review from deepakn94 April 2, 2026 20:13
@skyw

skyw commented Apr 2, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 5275694

Handles both standard optimizers (Adam, SGD) and emerging optimizers (e.g. Muon).
We use separate optimizers for expert parameters and non-expert parameters.
For emerging optimizers with ``config.use_layer_wise_distributed_optimizer=True``,
the optimizer is automatically wrapped with :class:`LayerWiseDistributedOptimizer`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should eventually re-name DistributedOptimizer to something else (maybe ParameterWiseDistributedOptimizer)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(can be done in a separate PR)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using element wise. can debate when we actually do it.

…LayerWise wrapping

The test was manually extracting base optimizers from Float16OptimizerWithFloat16Params
wrappers and re-wrapping them in LayerWiseDistributedOptimizer, but the fp32 master
param updates never propagated back to the bf16 model params. Use
use_layer_wise_distributed_optimizer=True in the config instead, which lets
_get_megatron_emerging_optimizer handle wrapping correctly.
@skyw

skyw commented Apr 2, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test c74dd1a

@svcnvidia-nemo-ci svcnvidia-nemo-ci added the Approved All necessary approvals have been made label Apr 2, 2026
@skyw

skyw commented Apr 2, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 90c5747

@skyw

skyw commented Apr 2, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 16f2e53

@skyw skyw added this pull request to the merge queue Apr 2, 2026
@svcnvidia-nemo-ci

Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23927338034

@skyw skyw changed the title Refactor emerging optimizer integrate Refactor emerging optimizer integration Apr 3, 2026
@svcnvidia-nemo-ci

Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23928037174

Merged via the queue into NVIDIA:main with commit 5b512b4 Apr 3, 2026
63 checks passed
@skyw skyw deleted the emerging-optimizer-refactor branch April 3, 2026 01:46
ko3n1g added a commit to ko3n1g/Megatron-LM that referenced this pull request Apr 6, 2026
This reverts commit 5b512b4.

Signed-off-by: oliver könig <okoenig@nvidia.com>
ko3n1g added a commit to ko3n1g/Megatron-LM that referenced this pull request Apr 6, 2026
This reverts commit 5b512b4.

Signed-off-by: oliver könig <okoenig@nvidia.com>
yangbofun pushed a commit to xlm-research/Megatron-LM that referenced this pull request May 22, 2026
Signed-off-by: Hao Wu <skyw@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Approved All necessary approvals have been made complexity: high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants