[PEFT, ckpts] feat: modelopt for LoRA & deepseek arch#3612
Conversation
|
/claude review |
There was a problem hiding this comment.
Pull request overview
This PR extends Megatron-Bridge’s PEFT (LoRA) and DeepSeek checkpoint-conversion support to better handle ModelOpt and DeepSeek MoE naming differences (notably when moe_grouped_gemm is disabled).
Changes:
- Add detection + adapter-attribute handling for ModelOpt’s local Megatron
Linearso it can be LoRA-wrapped correctly. - Route ModelOpt
Linearmodules away from thenn.Linearfast-path in both LoRA and CanonicalLoRA transforms. - Extend DeepSeek parameter mappings to cover
local_expertsnaming and fix MTP mapping wildcard replacement to only target the intended wildcard groups.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/megatron/bridge/peft/utils.py |
Adds is_modelopt_linear() and a ModelOpt-specific AdapterAttributes return path. |
src/megatron/bridge/peft/lora.py |
Ensures ModelOpt Linear does not go through the nn.Linear/TE adapter path. |
src/megatron/bridge/peft/canonical_lora.py |
Same exclusion for CanonicalLoRA’s nn.Linear fast-path. |
src/megatron/bridge/models/deepseek/common.py |
Adds DeepSeek local_experts mappings and corrects MTP wildcard replacement behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/claude review #3612 |
| def is_modelopt_linear(m: nn.Module) -> bool: | ||
| """Return whether a module is ModelOpt's local Megatron Linear.""" | ||
| cls = type(m) | ||
| return cls.__name__ == "Linear" and cls.__module__ == "megatron.core.post_training.modelopt.layers" |
There was a problem hiding this comment.
this looks a bit fragile?
you can put import inside check or guard it up
from megatron.core.post_training.modelopt.layers import Linear as ModelOptLinear
def is_modelopt_linear(m: nn.Module) -> bool:
return isinstance(m, ModelOptLinear)
There was a problem hiding this comment.
Thanks for reviewing, I just changed this into safe_import_from similar to those on top (i.e. TEColumnParallelLinear)
As MoE with modelopt sets `moe_grouped_gemm` disabled, https://github.com/NVIDIA/Megatron-LM/blob/12f18dafbf9ea1a947f06c7aecde0208c0ada161/megatron/core/post_training/modelopt/gpt/model_specs.py#L146 additional mappings are needed here. Also, modelopt linear layers should be correctly recognized for lora. Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
/ok to test dfd1c12 |
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: Hollow Man <hollowman@opensuse.org> Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>
What does this PR do ?
As MoE with modelopt sets
moe_grouped_gemmdisabled, https://github.com/NVIDIA/Megatron-LM/blob/12f18dafbf9ea1a947f06c7aecde0208c0ada161/megatron/core/post_training/modelopt/gpt/model_specs.py#L146 additional mappings are needed here. Also, modelopt linear layers should be correctly recognized for lora.Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information