Support for modelopt with MoE QAT#3866
Conversation
There was a problem hiding this comment.
Pull request overview
Adds support paths for ModelOpt/QAT MoE conversion behavior in Megatron Bridge, especially around local expert naming, quantized module detection, and related unit coverage.
Changes:
- Extends expert-number extraction and sorting to handle
local_experts.<N>names. - Updates conversion task filtering and EP name-globalization paths for adapter/quantizer handling.
- Registers quantized parallel module type detection and adds QAT bridge support tests.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
tests/unit_tests/models/test_qat_bridge_support.py |
Adds unit coverage for local expert naming, quantized mappings, adapter filtering, and ModelOpt layer spec exposure. |
src/megatron/bridge/utils/common_utils.py |
Extends expert number extraction for local expert parameter names. |
src/megatron/bridge/models/conversion/utils.py |
Extends deterministic sort keys for SequentialMLP local experts. |
src/megatron/bridge/models/conversion/peft_bridge.py |
Adds shared base-weight export skip logic for adapters and quantizer params. |
src/megatron/bridge/models/conversion/param_mapping.py |
Registers quantized parallel linear module names and generalizes fused LayerNorm-column detection. |
src/megatron/bridge/models/conversion/model_bridge.py |
Updates local-to-global expert name handling and applies the new skip predicate in conversion task construction. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/claude review |
Review — Support for modelopt with MoE QATOverall this looks solid. The separation of grouped-expert (weight N / bias N) vs. local-expert (local_experts.N) naming is clean and well-tested. What I checked
Minor inline comment
Suggested test cases No perf tests impacted. |
Refer to verl-project/verl@b96c8fb Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
/claude review |
Code Review: Support for modelopt with MoE QATThe changes cleanly extend expert param handling to support SequentialMLP Observations
Suggested test casesNo perf tests impacted. |
Signed-off-by: Hollow Man <hollowman@opensuse.org>
### What does this PR do? Refactor to use Megatron-Bridge new helper APIs: - NVIDIA-NeMo/Megatron-Bridge#3813 - NVIDIA-NeMo/Megatron-Bridge#3866 ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `veomni`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`, `fully_async`, `one_step_off` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/verl-project/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/verl-project/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/verl-project/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/verl-project/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) - [X] If your PR is related to the `recipe` submodule, please also update the reference to the submodule commit via `git submodule update --remote` or `cd recipe && git pull origin main`. --------- Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: Hollow Man <hollowman@opensuse.org> Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>
What does this PR do ?
Move all the patches for supporting modelopt based QAT on verl side to Megatron Bridge.
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information