Skip to content

Support for modelopt with MoE QAT#3866

Merged
yaoyu-33 merged 2 commits into
NVIDIA-NeMo:mainfrom
HollowMan6:qat_patch
May 20, 2026
Merged

Support for modelopt with MoE QAT#3866
yaoyu-33 merged 2 commits into
NVIDIA-NeMo:mainfrom
HollowMan6:qat_patch

Conversation

@HollowMan6

@HollowMan6 HollowMan6 commented May 17, 2026

Copy link
Copy Markdown
Member

What does this PR do ?

Move all the patches for supporting modelopt based QAT on verl side to Megatron Bridge.

Changelog

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)
image

Copilot AI review requested due to automatic review settings May 17, 2026 21:25

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support paths for ModelOpt/QAT MoE conversion behavior in Megatron Bridge, especially around local expert naming, quantized module detection, and related unit coverage.

Changes:

  • Extends expert-number extraction and sorting to handle local_experts.<N> names.
  • Updates conversion task filtering and EP name-globalization paths for adapter/quantizer handling.
  • Registers quantized parallel module type detection and adds QAT bridge support tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit_tests/models/test_qat_bridge_support.py Adds unit coverage for local expert naming, quantized mappings, adapter filtering, and ModelOpt layer spec exposure.
src/megatron/bridge/utils/common_utils.py Extends expert number extraction for local expert parameter names.
src/megatron/bridge/models/conversion/utils.py Extends deterministic sort keys for SequentialMLP local experts.
src/megatron/bridge/models/conversion/peft_bridge.py Adds shared base-weight export skip logic for adapters and quantizer params.
src/megatron/bridge/models/conversion/param_mapping.py Registers quantized parallel linear module names and generalizes fused LayerNorm-column detection.
src/megatron/bridge/models/conversion/model_bridge.py Updates local-to-global expert name handling and applies the new skip predicate in conversion task construction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/megatron/bridge/models/conversion/peft_bridge.py Outdated
Comment thread src/megatron/bridge/models/conversion/model_bridge.py Outdated
@HollowMan6

Copy link
Copy Markdown
Member Author

/claude review

Comment thread src/megatron/bridge/models/conversion/model_bridge.py Outdated
@claude

claude Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor

Review — Support for modelopt with MoE QAT

Overall this looks solid. The separation of grouped-expert (weight N / bias N) vs. local-expert (local_experts.N) naming is clean and well-tested.

What I checked

  • model_bridge.py: The new is_local_expert_param / is_grouped_expert_param split correctly routes EP name globalization. The regex avoids matching quantizer params like weight_quantizer._amax. The replace(..., 1) count-limit on the local_experts branch prevents accidental double-replacement.
  • param_mapping.py: QuantColumnParallelLinear / QuantRowParallelLinear registry additions are straightforward. The substring-based _FUSED_LAYER_NORM_COLUMN_PARALLEL_SUBSTRING check is a good generalization.
  • utils.py: extract_sort_key correctly falls back to local_experts.N when no (weight|bias)N suffix is found.
  • common_utils.py: Pattern ordering in extract_expert_number_from_param is correct — local_experts is most specific and checked first.
  • Test file: 11 focused unit tests cover the key paths including adapter exclusion, quantizer param passthrough, EP gather with local experts, and the new parallelism detection for Quant modules.

Minor inline comment

  • One readability nit on operator precedence (not x in y -> x not in y) posted inline.

Suggested test cases

No perf tests impacted.

Refer to verl-project/verl@b96c8fb

Signed-off-by: Hollow Man <hollowman@opensuse.org>
@HollowMan6

Copy link
Copy Markdown
Member Author

/claude review

@claude

claude Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor

Code Review: Support for modelopt with MoE QAT

The changes cleanly extend expert param handling to support SequentialMLP local_experts.<N> naming alongside the existing grouped expert weight<N>/bias<N> naming, and add Quant*ParallelLinear types to the module registry. The regex work in _megatron_local_name_to_global is careful about avoiding false positives on quantizer params like weight_quantizer._amax.

Observations

  1. Test coverage is solid. The new test file covers all the key code paths: local expert mapping, grouped expert mapping, adapter exclusion, quantizer exclusion, EP gather, parallelism type detection, and registry contents.

  2. _FUSED_LAYER_NORM_COLUMN_PARALLEL_SUBSTRING is a good generalization. Switching from an exact class-name set to a substring check future-proofs against any *LayerNormColumnParallelLinear variant without needing registry updates.

  3. Minor: _detect_parallelism_type test only covers the non-modelopt path. The test at line 159 creates a plain type(...) module, so is_modelopt_dynamic_module returns False. In the real modelopt scenario, the module is dynamic and get_original_cls_by_level(level=0).__name__ is used. The substring check works either way, but a test exercising the dynamic-module branch would strengthen confidence. Not a blocker.

  4. gather_from_ep_ranks regex note (param_mapping.py:769-770). The pre-existing re.sub(r"experts\.(\d+)", ...) on HF param names works because HF params use experts.N (not local_experts.N). Worth noting this would need updating if any HF model family adopts local_experts naming.

Suggested test cases

No perf tests impacted.

Signed-off-by: Hollow Man <hollowman@opensuse.org>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@yaoyu-33 yaoyu-33 added area:quant Quantization (PTQ, QAT, FP8 recipes) feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer labels May 17, 2026
@yaoyu-33 yaoyu-33 merged commit a75738a into NVIDIA-NeMo:main May 20, 2026
101 checks passed
@HollowMan6 HollowMan6 deleted the qat_patch branch May 20, 2026 01:44
wuxibin89 pushed a commit to verl-project/verl that referenced this pull request May 21, 2026
### What does this PR do?

Refactor to use Megatron-Bridge new helper APIs:
- NVIDIA-NeMo/Megatron-Bridge#3813
- NVIDIA-NeMo/Megatron-Bridge#3866

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `veomni`, `sglang`, `vllm`,
`rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`,
`deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`,
`model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`,
`fully_async`, `one_step_off`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/verl-project/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/verl-project/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/verl-project/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/verl-project/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
- [X] If your PR is related to the `recipe` submodule, please also
update the reference to the submodule commit via `git submodule update
--remote` or `cd recipe && git pull origin main`.

---------

Signed-off-by: Hollow Man <hollowman@opensuse.org>
vasunvidia pushed a commit to vasunvidia/Megatron-Bridge that referenced this pull request Jun 10, 2026
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:quant Quantization (PTQ, QAT, FP8 recipes) feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants