[model] feat: Model Support for Ling MoE V2 Model#2028
Conversation
This commit adds support for Bailing MoE V2 model including: - Initial implementation of bailing_moe2_bridge and bailing_moe2_provider - Fix layer spec configuration - Fix MoE config and param dtype handling - Fix MTP bug and QKV mapping - Fix linting issues Signed-off-by: Changlong <changlyu@amazon.com>
|
will add |
|
Hi @ccclyu thanks for contributing. did you verify this on any downstream RL workflow already? |
|
@ahxt has run SFT experiment on Ling-Mini-Base 2.0 using slime megatron trainer and the training curve/grad norm look fine. We also tested on trained models. In terms of SFT, the Megatron support looks good but we do not have chance to try RL.
|
Signed-off-by: Changlong <changlyu@amazon.com>
📝 WalkthroughWalkthroughThis PR introduces support for Bailing MoE v2 models and three Ling model variants (LingMini2, LingFlash2, Ling1T) through a new bridge implementation that converts HuggingFace models to Megatron GPTModel format, complete with MoE-specific parameter mappings, provider configurations, and comprehensive functional test coverage. Changes
Sequence Diagram(s)sequenceDiagram
participant HF as HuggingFace Model
participant Bridge as BailingMoeV2Bridge
participant Provider as BailingMoeV2ModelProvider
participant Tasks as Conversion Tasks
participant Registry as Mapping Registry
participant Megatron as Megatron GPTModel
HF->>Bridge: provider_bridge(hf_pretrained)
activate Bridge
Bridge->>Provider: Create with HF config<br/>(dims, heads, MoE settings)
activate Provider
Provider-->>Bridge: BailingMoeV2ModelProvider
deactivate Provider
Bridge-->>HF: Return provider
deactivate Bridge
HF->>Bridge: build_conversion_tasks(hf_pretrained,<br/>megatron_model)
activate Bridge
Bridge->>Tasks: Initialize with HF config
activate Tasks
Tasks-->>Bridge: Conversion tasks ready
deactivate Tasks
Bridge-->>HF: Tasks configured
deactivate Bridge
HF->>Bridge: mapping_registry()
activate Bridge
Bridge->>Registry: Build parameter mappings<br/>(embeddings, attention, MLP,<br/>ConcatenatedQKV, GatedMLP)
activate Registry
alt Has MTP Support
Registry->>Registry: Add per-layer MTP mappings<br/>(transformer_layer, attention, MLP)
end
Registry-->>Bridge: MegatronMappingRegistry
deactivate Registry
Bridge-->>HF: Complete registry
deactivate Bridge
HF->>Megatron: Convert via tasks & registry
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In `@src/megatron/bridge/models/__init__.py`:
- Around line 16-22: Add the missing export for BailingMoeV2Bridge to the
module's public API: update the __all__ list in
src/megatron/bridge/models/__init__.py to include "BailingMoeV2Bridge"
(alongside the already-exported GLM45Bridge, OlMoEBridge, NemotronBridge, and
the provider names Ling1TModelProvider, LingFlash2ModelProvider,
LingMini2ModelProvider) so the imported symbol is actually exported and the
Flake8 F401 is resolved.
In `@src/megatron/bridge/models/bailing/bailing_moe2_bridge.py`:
- Around line 56-62: Fix the typo in the module docstring: change "AutoBrige" to
"AutoBridge" in the example block so the class name matches the actual symbol;
update the docstring in bailing_moe2_bridge.py (the example that references
AutoBrige -> AutoBridge) to ensure the usage snippet correctly references
AutoBridge.
In `@src/megatron/bridge/models/bailing/bailing_moe2_provider.py`:
- Line 58: The variable init_method_std is annotated as int but assigned 0.02;
update its type hint to float (e.g., change init_method_std: int = 0.02 to
init_method_std: float = 0.02) in the BailingMoe2Provider (or the scope where
init_method_std is defined) and run a quick search for other uses or annotations
referencing init_method_std to ensure they expect a float and adjust any
docstrings or tests accordingly.
- Around line 59-73: Remove the duplicate attention_dropout field in the
dataclass: locate both occurrences of the attention_dropout attribute (the one
near the top with other config fields and the second one just above kv_channels)
and delete one so the dataclass only declares attention_dropout once (retain the
intended value, e.g., 0.0). Ensure there are no other duplicate config fields
introduced nearby (e.g., kv_channels) and run a quick lint/type-check to confirm
no references rely on the removed duplicate.
In `@tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py`:
- Around line 160-164: The test currently overwrites the config.json written by
model.save_pretrained(), losing model-generated metadata like auto_map and the
serialized torch_dtype; instead, load the file that model.save_pretrained()
produced (config_path), parse it, merge in only the missing/desired fields from
HF_BAILING_MOE2_TOY_MODEL_CONFIG (or skip the overwrite entirely), and write the
merged config back so fields such as auto_map and the model's serialized
torch_dtype remain intact; reference model.save_pretrained, config_to_save,
HF_BAILING_MOE2_TOY_MODEL_CONFIG and config_path to locate and change the logic.
- Around line 130-133: Remove the debug print in the loop that inspects the
model dtype (the for name, param in model.named_parameters(): print(...) break
block); either delete the print entirely or replace it with the standardized
non-duplicating logger (use print_rank_0 or logging.debug) so tests don't emit
raw prints across ranks—locate the snippet referencing model.named_parameters()
in the test and swap the print for print_rank_0(f"Before save - {name}:
{param.dtype}") or remove the block.
- Around line 108-126: The fallback in the except block currently calls
AutoModelForCausalLM.from_pretrained("inclusionAI/Ling-mini-2.0", ...) which
downloads a huge model to register the class; replace that heavy fallback by
either calling AutoConfig.from_pretrained("inclusionAI/Ling-mini-2.0",
trust_remote_code=True) to fetch only the config and remote code (so
AutoModelForCausalLM.from_config(config, trust_remote_code=True) can succeed) or
skip the test immediately with pytest.skip if class registration fails; update
the block around AutoModelForCausalLM.from_pretrained,
AutoModelForCausalLM.from_config, and pytest.skip to use
AutoConfig.from_pretrained or direct pytest.skip instead of downloading the full
model.
- Around line 311-314: The test currently hardcodes /opt/Megatron-Bridge/ in the
coverage args which breaks non-CI environments; update the arguments to derive
the paths from the repo root already computed as cwd (or from Path(__file__)),
e.g. replace "--data-file=/opt/Megatron-Bridge/.coverage" with "--data-file=" +
str(cwd / ".coverage") and replace "--source=/opt/Megatron-Bridge/" with
"--source=" + str(cwd) so the coverage data file and source path are computed
dynamically (refer to the cwd variable computed in this test file).
🧹 Nitpick comments (8)
src/megatron/bridge/models/bailing/__init__.py (1)
25-31:__all__is not sorted alphabetically.Ruff flags
RUF022. Sorting__all__keeps the public API surface consistent and easier to maintain.♻️ Suggested sort
__all__ = [ "BailingMoeV2Bridge", "BailingMoeV2ModelProvider", - "LingMini2ModelProvider", - "LingFlash2ModelProvider", "Ling1TModelProvider", + "LingFlash2ModelProvider", + "LingMini2ModelProvider", ]src/megatron/bridge/models/bailing/bailing_moe2_provider.py (2)
78-78: Commented-out code without explanation.As per coding guidelines, commented-out code should include a comment describing its usage and why it is commented out; otherwise it should be removed before merging. Consider adding a brief rationale or removing this line.
28-33: Unusednoqadirective on line 29.Ruff flags
RUF100: the# noqa: F401is unnecessary here since the import is used (to setHAVE_TE). Remove the stale directive.♻️ Cleanup
- import transformer_engine # type: ignore # noqa: F401 + import transformer_engine # type: ignoresrc/megatron/bridge/models/bailing/bailing_moe2_bridge.py (3)
90-94: Missing type hints onbuild_conversion_tasks.Per coding guidelines, function arguments and return types should have type hints. The override should match the parent signature.
♻️ Add type hints
- def build_conversion_tasks(self, hf_pretrained, megatron_model): + def build_conversion_tasks(self, hf_pretrained: PreTrainedCausalLM, megatron_model: GPTModel):
154-170: MTP megatron-param rewriting with chainedstr.replaceis subtle and fragile.The logic assumes
".*"appears exactly once in everylayer_specific_mappingskey. Today that holds, but a future mapping with an additional wildcard (e.g.,decoder.layers.*.mlp.experts.*.some_param) would be silently corrupted becausestr.replaceis global.Consider using
str.replace(old, new, count=1)for each step to make the assumption explicit, or refactor to a regex/split-based approach.♻️ Safer minimal change
megatron_param = ( - megatron_param.replace(".*", ".*.transformer_layer") - .replace("decoder", "mtp") - .replace(".*", f".{mtp_layer}") + megatron_param.replace(".*", ".*.transformer_layer", 1) + .replace("decoder", "mtp", 1) + .replace(".*", f".{mtp_layer}", 1) )
96-218:mapping_registrylacks a return type docstring explaining the mapping structure.This method builds a complex registry with conditional MTP paths and two-to-one HF→Megatron mappings (e.g., both
input_layernormandlinear_qkv.layer_norm_weightmap to the same HF key). A brief docstring covering the overall strategy would help future maintainers.tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py (2)
275-275: Usepytest.fail()instead ofassert False.
assert Falsestatements are stripped when Python runs with-O(optimized). Usepytest.fail(msg)which is unconditional and idiomatic in pytest.♻️ Fix both occurrences
Line 275:
- assert False, f"Failed to load created toy MoE model: {e}" + pytest.fail(f"Failed to load created toy MoE model: {e}")Line 339:
- assert False, f"Bailing MoE V2 {test_name} conversion failed with return code {result.returncode}" + pytest.fail(f"Bailing MoE V2 {test_name} conversion failed with return code {result.returncode}")Also applies to: 339-339
277-285: GPU test lacks hardware requirement documentation.Per coding guidelines: "Document hardware requirements for GPU tests." The parameterized tests each require
nproc_per_node=2, so at minimum 2 GPUs are needed. Add a brief comment or docstring noting this.
| except Exception as e: | ||
| # If that fails, try loading a minimal model to register the class | ||
| try: | ||
| # Load a tiny model just to register the class | ||
| _ = AutoModelForCausalLM.from_pretrained( | ||
| "inclusionAI/Ling-mini-2.0", | ||
| trust_remote_code=True, | ||
| torch_dtype=torch.bfloat16, | ||
| device_map="cpu", # Don't use GPU for this | ||
| ) | ||
| # Now try again | ||
| model = AutoModelForCausalLM.from_config( | ||
| config, trust_remote_code=True | ||
| ) | ||
| except Exception as e2: | ||
| pytest.skip( | ||
| f"Could not create Bailing MoE V2 model: {e}. " | ||
| f"Fallback also failed: {e2}. Model class may require custom code." | ||
| ) |
There was a problem hiding this comment.
Inner fallback downloads the full Ling-mini-2.0 model (~16B params) just to register the class.
If from_config fails, the fallback on lines 112-117 calls from_pretrained("inclusionAI/Ling-mini-2.0"), which downloads the entire multi-billion parameter model to CPU. This makes the test extremely slow and resource-intensive, and can cause OOM on CI machines. Consider using AutoConfig.from_pretrained(..., trust_remote_code=True) (which only downloads config + code files) to register the class, or calling pytest.skip directly instead of this heavy fallback.
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 108-108: Do not catch blind exception: Exception
(BLE001)
[warning] 122-122: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In `@tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py` around
lines 108 - 126, The fallback in the except block currently calls
AutoModelForCausalLM.from_pretrained("inclusionAI/Ling-mini-2.0", ...) which
downloads a huge model to register the class; replace that heavy fallback by
either calling AutoConfig.from_pretrained("inclusionAI/Ling-mini-2.0",
trust_remote_code=True) to fetch only the config and remote code (so
AutoModelForCausalLM.from_config(config, trust_remote_code=True) can succeed) or
skip the test immediately with pytest.skip if class registration fails; update
the block around AutoModelForCausalLM.from_pretrained,
AutoModelForCausalLM.from_config, and pytest.skip to use
AutoConfig.from_pretrained or direct pytest.skip instead of downloading the full
model.
| "coverage", | ||
| "run", | ||
| "--data-file=/opt/Megatron-Bridge/.coverage", | ||
| "--source=/opt/Megatron-Bridge/", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Search for hardcoded /opt/Megatron-Bridge paths in test files
rg -n '/opt/Megatron-Bridge' --type=py -g 'tests/**'Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 19996
🏁 Script executed:
# Examine the specific test file to understand context
wc -l tests/functional_tests/models/bailing/test_bailing_moe2_conversion.pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 144
🏁 Script executed:
# Look at lines around 311-332 to see the context, especially the cwd argument mentioned on line 332
sed -n '305,335p' tests/functional_tests/models/bailing/test_bailing_moe2_conversion.pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 1063
Hardcoded /opt/Megatron-Bridge/ paths in coverage configuration are CI-specific and will fail in other environments.
The --data-file and --source arguments assume the repo is deployed at /opt/Megatron-Bridge/. This pattern appears across many test files and should be refactored to derive paths from the repo root (using cwd already computed on line 332 or Path(__file__)).
🤖 Prompt for AI Agents
In `@tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py` around
lines 311 - 314, The test currently hardcodes /opt/Megatron-Bridge/ in the
coverage args which breaks non-CI environments; update the arguments to derive
the paths from the repo root already computed as cwd (or from Path(__file__)),
e.g. replace "--data-file=/opt/Megatron-Bridge/.coverage" with "--data-file=" +
str(cwd / ".coverage") and replace "--source=/opt/Megatron-Bridge/" with
"--source=" + str(cwd) so the coverage data file and source path are computed
dynamically (refer to the cwd variable computed in this test file).
|
/ok to test b9d0b2e |
Signed-off-by: Changlong <changlyu@amazon.com>
|
/ok to test d3f9cfc |
Signed-off-by: Changlong <changlyu@amazon.com>
|
/ok to test c5d5e07 |
Made-with: Cursor # Conflicts: # src/megatron/bridge/models/__init__.py
|
it's okay, we can finish the last one mile. There are some clean ups, I can just run your pr. |
|
ok. thks so much! if you meet some issue when running solely on this PR, please let me know. |
…ean up bridge - Remove redundant build_conversion_tasks override; use self.hf_config already set by the bridge dispatch system - Add toy-model conversion test under test_groups/models/bailing/ matching the MiniMax M2 / DeepSeek style - Add L0_Launch_models_bailing.sh for CI auto-discovery - Remove unused provider file and provider test Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor
|
i am doing a little more tests locally |
|
/claude review |
…les, remove stale test - Fix bridge: set moe_router_score_function="sigmoid" (required when moe_router_enable_expert_bias=True, was causing ValueError on model init) - Add examples/models/bailing/ with conversion.sh, inference.sh, README.md for Ling-flash-2.0 (verified round-trip and inference on 8-GPU node) - Remove stale tests/functional_tests/models/bailing/ duplicate (the authoritative test is in test_groups/models/bailing/) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor
|
/ok to test 972f875 |
1 similar comment
|
/ok to test 972f875 |
|
@ccclyu done updating, Feel free to review the new code heuristic for model additions. |
|
@yaoyu-33 thanks and current code structure looks great! For the CI/CD run L0_Launch_models_bailing, the |
|
/ok to test 5853894 |
… custom arch dispatch - Add BailingMoeV2Bridge, BailingMoeV2Config, BailingMoeV2ForCausalLM for Ling MoE2 models - Register bailing_moe_v2 with AutoConfig/AutoModelForCausalLM at import time so AutoConfig.from_pretrained works without hub access in offline CI environments - Fix _causal_lm_architecture in AutoBridge to fall back to class-name string when a custom arch (e.g. BailingMoeV2ForCausalLM) is not in standard transformers, enabling bridge dispatch for models registered via AutoConfig.register - Add expert_bias to IGNORE_PRECISION_PARAMS in roundtrip script: MoE gate expert bias is stored as float32 in Megatron but bfloat16 in HF - Add functional tests for TP/PP/EP conversion of toy BailingMoeV2 model Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
|
/ok to test 0d2c83a |
Adapted vendor modeling files don't require docstrings on every class/function. Add modeling_*.py pattern to per-file-ignores in ruff.toml. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
|
/ok to test 564ef52 |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
|
/ok to test 2c2fa1c |
…s for custom model fallback Custom models registered via AutoConfig.register (e.g. BailingMoeV2ForCausalLM) are not in standard transformers but are valid — _causal_lm_architecture now returns the class name as a string for bridge dispatch instead of raising ValueError. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
|
/ok to test e0c1b30 |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
|
/ok to test 500f9f8 |

What does this PR do ?
Support Ling MoE V2 Model (Ling-mini, Ling-flash, Ling-1T) in https://huggingface.co/collections/inclusionAI/ling-v2
Changelog
Following the instruction of https://docs.nvidia.com/nemo/megatron-bridge/latest/adding-new-models.html, add the model support for Ling MoE V2.
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit
Release Notes
New Features
Tests