fix mfsdp unwrap stuck at MegatronFSDP [dev]#4273
Merged
Merged
Conversation
BestJuly
approved these changes
Apr 13, 2026
Contributor
|
Can we add a unit test to guard this? |
Member
Agree w/ @BestJuly. And also there is a list of files that use this helper function, can we double-check / justify that Megatron-FSDP model logic surrounding the use of this utility is all valid?
|
Member
|
/ok to test ff0ed7e |
Member
|
Needs linting! |
Guard that unwrap_model correctly peels through both DDP and Megatron-FSDP wrapping hierarchies to reach the underlying GPTModel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ff0ed7e to
3c2e66d
Compare
Member
Author
|
/ok to test 67077c9 |
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/24432478244 |
yaoyu-33
added a commit
to NVIDIA-NeMo/Megatron-Bridge
that referenced
this pull request
Apr 20, 2026
…reprocessing MCore's unwrap_model now strips the MegatronFSDP layer (added in NVIDIA/Megatron-LM#4273), so preprocess_fsdp_dtensor_state_dict receives a fully unwrapped GPTModel. The downstream MCore functions (handle_swiglu_in_state_dict, handle_gdn_in_state_dict) call model.get_parameter("module.{key}") which requires a .module wrapper. Re-wrap the model when it arrives without one. Fixes: AttributeError: GPTModel has no attribute `module` Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Member
Author
wplf
added a commit
to wplf/Megatron-LM
that referenced
this pull request
Apr 20, 2026
This reverts commit 9a7c5dd.
yaox12
pushed a commit
to Phlip79/Megatron-LM
that referenced
this pull request
Apr 21, 2026
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
main PR: #4274
Problem: unwrap_model() in megatron/core/utils.py gets stuck when unwrapping a model wrapped with Megatron-FSDP. The wrapping hierarchy is:
FullyShardedDataParallel (mcore adapter)
└── .module → MegatronFSDP (core FSDP impl)
└── .module → actual model (e.g., GPTModel)
The old code only knew how to peel through DDP, torch_FSDP, megatron_FSDP (the adapter), and Float16Module. It would unwrap the outer FullyShardedDataParallel but then hit the inner MegatronFSDP and stop — returning MegatronFSDP instead of the actual model.
Fix: One-line change — adds MegatronFSDP (from megatron.core.distributed.fsdp.src.megatron_fsdp.megatron_fsdp) to the default module_instances tuple, so the while isinstance(...) loop can peel through both wrapper layers.
You can use script below to see what happens.
result