Support saving custom megatron model (e.g. pruned variant) back to Hugging Face format

**Is your feature request related to a problem? Please describe.**
Currently HF can be converted to megatron but when saving the megatron model back to HF, it expects the model architecture to be exactly same and only allows weights to be changed.

When we use ModelOpt to prune a GPT/Mamba model, it will result in a model with less number of layers, lower hidden/ffn/attentions/etc. While we can easily save this as a megatron ckpt, we want to be able to save this as a hugging face checkpoint as well 

**Describe the solution you'd like**

Natively supported in `bridge.save_hf_pretrained()` API

**Describe alternatives you've considered**

```python
bridge = AutoBridge.from_hf_pretrained("Qwen/Qwen3-0.6B")
provider = bridge.to_megatron_provider()
provider.finalize()
model = provider.provide_distributed_model(wrap_with_ddp=False)
unwrapped_model = model[0].module

kept_layer_nums = range(1,25)  # 1-indexed
modelopt.prune(model, num_layers=24, hidden_size=1024, kept_layer_nums=kept_layer_nums)

# Save artifacts, overwrite config, create dummy weights, use new bridge to save pruned model weights
bridge.hf_pretrained.save_artifacts(output_hf_path)
hf_cfg = AutoConfig.from_pretrained(output_hf_path)
mcore_cfg = unwrapped_model.config

hf_cfg.hidden_size = mcore_cfg.hidden_size
hf_cfg.intermediate_size = mcore_cfg.ffn_hidden_size
hf_cfg.num_attention_heads = mcore_cfg.num_attention_heads
hf_cfg.head_dim = mcore_cfg.kv_channels
hf_cfg.num_key_value_heads = mcore_cfg.num_query_groups
if hasattr(hf_cfg, "mamba_num_heads"):
    hf_cfg.mamba_num_heads = mcore_cfg.mamba_num_heads
if hasattr(hf_cfg, "mamba_head_dim"):
    hf_cfg.mamba_head_dim = mcore_cfg.mamba_head_dim
if hasattr(hf_cfg, "moe_intermediate_size"):
    hf_cfg.moe_intermediate_size = mcore_cfg.moe_ffn_hidden_size
if hasattr(hf_cfg, "moe_shared_expert_intermediate_size"):
    hf_cfg.moe_shared_expert_intermediate_size = (
        mcore_cfg.moe_shared_expert_intermediate_size
    )
if hasattr(hf_cfg, "num_experts"):
    hf_cfg.num_experts = mcore_cfg.num_moe_experts
if hasattr(hf_cfg, "n_routed_experts"):
    hf_cfg.n_routed_experts = mcore_cfg.num_moe_experts
if hasattr(hf_cfg, "n_shared_experts"):
    hf_cfg.n_shared_experts = (
        mcore_cfg.moe_shared_expert_intermediate_size // mcore_cfg.moe_ffn_hidden_size
    )
if hasattr(hf_cfg, "layer_types"):
    hf_cfg.layer_types = [
        lt for i, lt in enumerate(hf_cfg.layer_types) if i + 1 in kept_layer_nums
    ]
hf_cfg.num_hidden_layers = mcore_cfg.num_layers

# Save dummy pruned HF model to get the correct bridge for saving pruned weights
AutoModelForCausalLM.from_config(hf_cfg).save_pretrained(output_hf_path)
pruned_bridge = AutoBridge.from_hf_pretrained(output_hf_path)
pruned_bridge.save_hf_weights(model, output_hf_path)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support saving custom megatron model (e.g. pruned variant) back to Hugging Face format #2036

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Support saving custom megatron model (e.g. pruned variant) back to Hugging Face format #2036

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions