[model] feat: Model Support for Ling MoE V2 Model by ccclyu · Pull Request #2028 · NVIDIA-NeMo/Megatron-Bridge

ccclyu · 2026-01-22T11:25:06Z

What does this PR do ?

Support Ling MoE V2 Model (Ling-mini, Ling-flash, Ling-1T) in https://huggingface.co/collections/inclusionAI/ling-v2

Changelog

Following the instruction of https://docs.nvidia.com/nemo/megatron-bridge/latest/adding-new-models.html, add the model support for Ling MoE V2.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Summary by CodeRabbit

Release Notes

New Features
- Added support for Bailing MoE V2 models including Ling1T, LingFlash2, and LingMini2 variants.
- Models now support conversion workflows with multiple parallelism configurations (tensor, pipeline, and expert parallelism).
Tests
- Added functional tests for Bailing MoE V2 model conversion and provider configuration validation.

This commit adds support for Bailing MoE V2 model including: - Initial implementation of bailing_moe2_bridge and bailing_moe2_provider - Fix layer spec configuration - Fix MoE config and param dtype handling - Fix MTP bug and QKV mapping - Fix linting issues Signed-off-by: Changlong <changlyu@amazon.com>

copy-pr-bot · 2026-01-22T11:25:10Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

ccclyu · 2026-01-23T10:12:47Z

will add docs and tests soon.

yaoyu-33 · 2026-01-28T17:21:01Z

Hi @ccclyu thanks for contributing. did you verify this on any downstream RL workflow already?

ccclyu · 2026-01-29T04:02:06Z

@ahxt has run SFT experiment on Ling-Mini-Base 2.0 using slime megatron trainer and the training curve/grad norm look fine. We also tested on trained models. In terms of SFT, the Megatron support looks good but we do not have chance to try RL.

Signed-off-by: Changlong <changlyu@amazon.com>

coderabbitai · 2026-02-06T08:50:34Z

📝 Walkthrough

Walkthrough

This PR introduces support for Bailing MoE v2 models and three Ling model variants (LingMini2, LingFlash2, Ling1T) through a new bridge implementation that converts HuggingFace models to Megatron GPTModel format, complete with MoE-specific parameter mappings, provider configurations, and comprehensive functional test coverage.

Changes

Cohort / File(s)	Summary
Package API Extensions `src/megatron/bridge/models/__init__.py`, `src/megatron/bridge/models/bailing/__init__.py`	Added imports and exports for BailingMoeV2Bridge, BailingMoeV2ModelProvider, and three Ling model provider classes (Ling1TModelProvider, LingFlash2ModelProvider, LingMini2ModelProvider) to expose them as part of the public API surface.
Bridge Implementation `src/megatron/bridge/models/bailing/bailing_moe2_bridge.py`	Introduced BailingMoeV2Bridge class bridging BailingMoeV2ForCausalLM to GPTModel. Includes provider creation logic, parameter mapping registry with specialized handlers (ConcatenatedQKVMapping, GatedMLPMapping), and optional MTP support with dynamic per-layer mappings.
Model Providers `src/megatron/bridge/models/bailing/bailing_moe2_provider.py`	Added base BailingMoeV2ModelProvider and three cascading Ling-specific providers with configurable MoE geometry (layer counts, hidden sizes, expert counts, MoE frequencies, intermediate sizes) and MTP parameters.
Conversion Functional Tests `tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py`	New test module validating BailingMoeV2 conversion across parallelism configurations. Includes toy model fixture with fallback logic for custom model registration, toy model structure validation, and multi-GPU conversion testing via distributed run.
Provider Equivalence Tests `tests/functional_tests/models/bailing/test_bailing_moe2_provider.py`	Added parameterized test comparing bridge-generated provider configs against predefined Ling providers using AutoBridge and configuration comparison utilities.

Sequence Diagram(s)

sequenceDiagram
    participant HF as HuggingFace Model
    participant Bridge as BailingMoeV2Bridge
    participant Provider as BailingMoeV2ModelProvider
    participant Tasks as Conversion Tasks
    participant Registry as Mapping Registry
    participant Megatron as Megatron GPTModel

    HF->>Bridge: provider_bridge(hf_pretrained)
    activate Bridge
    Bridge->>Provider: Create with HF config<br/>(dims, heads, MoE settings)
    activate Provider
    Provider-->>Bridge: BailingMoeV2ModelProvider
    deactivate Provider
    Bridge-->>HF: Return provider
    deactivate Bridge

    HF->>Bridge: build_conversion_tasks(hf_pretrained,<br/>megatron_model)
    activate Bridge
    Bridge->>Tasks: Initialize with HF config
    activate Tasks
    Tasks-->>Bridge: Conversion tasks ready
    deactivate Tasks
    Bridge-->>HF: Tasks configured
    deactivate Bridge

    HF->>Bridge: mapping_registry()
    activate Bridge
    Bridge->>Registry: Build parameter mappings<br/>(embeddings, attention, MLP,<br/>ConcatenatedQKV, GatedMLP)
    activate Registry
    alt Has MTP Support
        Registry->>Registry: Add per-layer MTP mappings<br/>(transformer_layer, attention, MLP)
    end
    Registry-->>Bridge: MegatronMappingRegistry
    deactivate Registry
    Bridge-->>HF: Complete registry
    deactivate Bridge

    HF->>Megatron: Convert via tasks & registry

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Nemotron-3-Nano Model Support #1914: Modifies the same src/megatron/bridge/models/__init__.py file to extend model provider imports and public exports.

Suggested labels

Run CICD

Suggested reviewers

yaoyu-33
cuichenx
ananthsub

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR adds Ling MoE V2 support with new functional tests but lacks formal documentation of test results, convergence validation, and performance metrics in the PR description.	Document test execution results, quantitative SFT metrics, RL validation status, and confirmation of no numeric regressions in PR description.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '[model] feat: Model Support for Ling MoE V2 Model' clearly describes the main change: adding support for Ling MoE V2 models. It directly aligns with the substantial additions of bridge implementations, provider classes, and test coverage for these models.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

🤖 Fix all issues with AI agents

In `@src/megatron/bridge/models/__init__.py`:
- Around line 16-22: Add the missing export for BailingMoeV2Bridge to the
module's public API: update the __all__ list in
src/megatron/bridge/models/__init__.py to include "BailingMoeV2Bridge"
(alongside the already-exported GLM45Bridge, OlMoEBridge, NemotronBridge, and
the provider names Ling1TModelProvider, LingFlash2ModelProvider,
LingMini2ModelProvider) so the imported symbol is actually exported and the
Flake8 F401 is resolved.

In `@src/megatron/bridge/models/bailing/bailing_moe2_bridge.py`:
- Around line 56-62: Fix the typo in the module docstring: change "AutoBrige" to
"AutoBridge" in the example block so the class name matches the actual symbol;
update the docstring in bailing_moe2_bridge.py (the example that references
AutoBrige -> AutoBridge) to ensure the usage snippet correctly references
AutoBridge.

In `@src/megatron/bridge/models/bailing/bailing_moe2_provider.py`:
- Line 58: The variable init_method_std is annotated as int but assigned 0.02;
update its type hint to float (e.g., change init_method_std: int = 0.02 to
init_method_std: float = 0.02) in the BailingMoe2Provider (or the scope where
init_method_std is defined) and run a quick search for other uses or annotations
referencing init_method_std to ensure they expect a float and adjust any
docstrings or tests accordingly.
- Around line 59-73: Remove the duplicate attention_dropout field in the
dataclass: locate both occurrences of the attention_dropout attribute (the one
near the top with other config fields and the second one just above kv_channels)
and delete one so the dataclass only declares attention_dropout once (retain the
intended value, e.g., 0.0). Ensure there are no other duplicate config fields
introduced nearby (e.g., kv_channels) and run a quick lint/type-check to confirm
no references rely on the removed duplicate.

In `@tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py`:
- Around line 160-164: The test currently overwrites the config.json written by
model.save_pretrained(), losing model-generated metadata like auto_map and the
serialized torch_dtype; instead, load the file that model.save_pretrained()
produced (config_path), parse it, merge in only the missing/desired fields from
HF_BAILING_MOE2_TOY_MODEL_CONFIG (or skip the overwrite entirely), and write the
merged config back so fields such as auto_map and the model's serialized
torch_dtype remain intact; reference model.save_pretrained, config_to_save,
HF_BAILING_MOE2_TOY_MODEL_CONFIG and config_path to locate and change the logic.
- Around line 130-133: Remove the debug print in the loop that inspects the
model dtype (the for name, param in model.named_parameters(): print(...) break
block); either delete the print entirely or replace it with the standardized
non-duplicating logger (use print_rank_0 or logging.debug) so tests don't emit
raw prints across ranks—locate the snippet referencing model.named_parameters()
in the test and swap the print for print_rank_0(f"Before save - {name}:
{param.dtype}") or remove the block.
- Around line 108-126: The fallback in the except block currently calls
AutoModelForCausalLM.from_pretrained("inclusionAI/Ling-mini-2.0", ...) which
downloads a huge model to register the class; replace that heavy fallback by
either calling AutoConfig.from_pretrained("inclusionAI/Ling-mini-2.0",
trust_remote_code=True) to fetch only the config and remote code (so
AutoModelForCausalLM.from_config(config, trust_remote_code=True) can succeed) or
skip the test immediately with pytest.skip if class registration fails; update
the block around AutoModelForCausalLM.from_pretrained,
AutoModelForCausalLM.from_config, and pytest.skip to use
AutoConfig.from_pretrained or direct pytest.skip instead of downloading the full
model.
- Around line 311-314: The test currently hardcodes /opt/Megatron-Bridge/ in the
coverage args which breaks non-CI environments; update the arguments to derive
the paths from the repo root already computed as cwd (or from Path(__file__)),
e.g. replace "--data-file=/opt/Megatron-Bridge/.coverage" with "--data-file=" +
str(cwd / ".coverage") and replace "--source=/opt/Megatron-Bridge/" with
"--source=" + str(cwd) so the coverage data file and source path are computed
dynamically (refer to the cwd variable computed in this test file).

🧹 Nitpick comments (8)

src/megatron/bridge/models/bailing/__init__.py (1)
25-31: __all__ is not sorted alphabetically.

Ruff flags RUF022. Sorting __all__ keeps the public API surface consistent and easier to maintain.
♻️ Suggested sort
 __all__ = [
     "BailingMoeV2Bridge",
     "BailingMoeV2ModelProvider",
-    "LingMini2ModelProvider",
-    "LingFlash2ModelProvider",
     "Ling1TModelProvider",
+    "LingFlash2ModelProvider",
+    "LingMini2ModelProvider",
 ]
src/megatron/bridge/models/bailing/bailing_moe2_provider.py (2)
78-78: Commented-out code without explanation.

As per coding guidelines, commented-out code should include a comment describing its usage and why it is commented out; otherwise it should be removed before merging. Consider adding a brief rationale or removing this line.

28-33: Unused noqa directive on line 29.

Ruff flags RUF100: the # noqa: F401 is unnecessary here since the import is used (to set HAVE_TE). Remove the stale directive.
♻️ Cleanup
-    import transformer_engine  # type: ignore  # noqa: F401
+    import transformer_engine  # type: ignore
src/megatron/bridge/models/bailing/bailing_moe2_bridge.py (3)
90-94: Missing type hints on build_conversion_tasks.

Per coding guidelines, function arguments and return types should have type hints. The override should match the parent signature.
♻️ Add type hints
-    def build_conversion_tasks(self, hf_pretrained, megatron_model):
+    def build_conversion_tasks(self, hf_pretrained: PreTrainedCausalLM, megatron_model: GPTModel):
154-170: MTP megatron-param rewriting with chained str.replace is subtle and fragile.

The logic assumes ".*" appears exactly once in every layer_specific_mappings key. Today that holds, but a future mapping with an additional wildcard (e.g., decoder.layers.*.mlp.experts.*.some_param) would be silently corrupted because str.replace is global.

Consider using str.replace(old, new, count=1) for each step to make the assumption explicit, or refactor to a regex/split-based approach.
♻️ Safer minimal change
             megatron_param = (
-                megatron_param.replace(".*", ".*.transformer_layer")
-                .replace("decoder", "mtp")
-                .replace(".*", f".{mtp_layer}")
+                megatron_param.replace(".*", ".*.transformer_layer", 1)
+                .replace("decoder", "mtp", 1)
+                .replace(".*", f".{mtp_layer}", 1)
             )
96-218: mapping_registry lacks a return type docstring explaining the mapping structure.

This method builds a complex registry with conditional MTP paths and two-to-one HF→Megatron mappings (e.g., both input_layernorm and linear_qkv.layer_norm_weight map to the same HF key). A brief docstring covering the overall strategy would help future maintainers.
tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py (2)
275-275: Use pytest.fail() instead of assert False.

assert False statements are stripped when Python runs with -O (optimized). Use pytest.fail(msg) which is unconditional and idiomatic in pytest.
♻️ Fix both occurrences

Line 275:
-                assert False, f"Failed to load created toy MoE model: {e}"
+                pytest.fail(f"Failed to load created toy MoE model: {e}")
Line 339:
-                assert False, f"Bailing MoE V2 {test_name} conversion failed with return code {result.returncode}"
+                pytest.fail(f"Bailing MoE V2 {test_name} conversion failed with return code {result.returncode}")
Also applies to: 339-339

277-285: GPU test lacks hardware requirement documentation.

Per coding guidelines: "Document hardware requirements for GPU tests." The parameterized tests each require nproc_per_node=2, so at minimum 2 GPUs are needed. Add a brief comment or docstring noting this.

coderabbitai · 2026-02-06T08:50:38Z

+        except Exception as e:
+            # If that fails, try loading a minimal model to register the class
+            try:
+                # Load a tiny model just to register the class
+                _ = AutoModelForCausalLM.from_pretrained(
+                    "inclusionAI/Ling-mini-2.0",
+                    trust_remote_code=True,
+                    torch_dtype=torch.bfloat16,
+                    device_map="cpu",  # Don't use GPU for this
+                )
+                # Now try again
+                model = AutoModelForCausalLM.from_config(
+                    config, trust_remote_code=True
+                )
+            except Exception as e2:
+                pytest.skip(
+                    f"Could not create Bailing MoE V2 model: {e}. "
+                    f"Fallback also failed: {e2}. Model class may require custom code."
+                )


⚠️ Potential issue | 🟠 Major

Inner fallback downloads the full Ling-mini-2.0 model (~16B params) just to register the class.

If from_config fails, the fallback on lines 112-117 calls from_pretrained("inclusionAI/Ling-mini-2.0"), which downloads the entire multi-billion parameter model to CPU. This makes the test extremely slow and resource-intensive, and can cause OOM on CI machines. Consider using AutoConfig.from_pretrained(..., trust_remote_code=True) (which only downloads config + code files) to register the class, or calling pytest.skip directly instead of this heavy fallback.

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 108-108: Do not catch blind exception: Exception

(BLE001)

[warning] 122-122: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In `@tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py` around lines 108 - 126, The fallback in the except block currently calls AutoModelForCausalLM.from_pretrained("inclusionAI/Ling-mini-2.0", ...) which downloads a huge model to register the class; replace that heavy fallback by either calling AutoConfig.from_pretrained("inclusionAI/Ling-mini-2.0", trust_remote_code=True) to fetch only the config and remote code (so AutoModelForCausalLM.from_config(config, trust_remote_code=True) can succeed) or skip the test immediately with pytest.skip if class registration fails; update the block around AutoModelForCausalLM.from_pretrained, AutoModelForCausalLM.from_config, and pytest.skip to use AutoConfig.from_pretrained or direct pytest.skip instead of downloading the full model.

coderabbitai · 2026-02-06T08:50:38Z

+            "coverage",
+            "run",
+            "--data-file=/opt/Megatron-Bridge/.coverage",
+            "--source=/opt/Megatron-Bridge/",


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for hardcoded /opt/Megatron-Bridge paths in test files rg -n '/opt/Megatron-Bridge' --type=py -g 'tests/**'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 19996

🏁 Script executed:

# Examine the specific test file to understand context wc -l tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 144

🏁 Script executed:

# Look at lines around 311-332 to see the context, especially the cwd argument mentioned on line 332 sed -n '305,335p' tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1063

Hardcoded /opt/Megatron-Bridge/ paths in coverage configuration are CI-specific and will fail in other environments.

The --data-file and --source arguments assume the repo is deployed at /opt/Megatron-Bridge/. This pattern appears across many test files and should be refactored to derive paths from the repo root (using cwd already computed on line 332 or Path(__file__)).

🤖 Prompt for AI Agents

In `@tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py` around lines 311 - 314, The test currently hardcodes /opt/Megatron-Bridge/ in the coverage args which breaks non-CI environments; update the arguments to derive the paths from the repo root already computed as cwd (or from Path(__file__)), e.g. replace "--data-file=/opt/Megatron-Bridge/.coverage" with "--data-file=" + str(cwd / ".coverage") and replace "--source=/opt/Megatron-Bridge/" with "--source=" + str(cwd) so the coverage data file and source path are computed dynamically (refer to the cwd variable computed in this test file).

ISEEKYAN · 2026-02-06T08:51:05Z

/ok to test b9d0b2e

Signed-off-by: Changlong <changlyu@amazon.com>

ccclyu · 2026-02-09T23:24:02Z

@ISEEKYAN @yaoyu-33 fixed the minor issued raised by coderabbit. can you review again and trigger the ci/cd? Thanks so much!

ISEEKYAN · 2026-02-10T06:58:30Z

/ok to test d3f9cfc

Signed-off-by: Changlong <changlyu@amazon.com>

ISEEKYAN · 2026-02-11T02:16:59Z

/ok to test c5d5e07

Made-with: Cursor # Conflicts: # src/megatron/bridge/models/__init__.py

yaoyu-33 · 2026-03-25T06:13:38Z

it's okay, we can finish the last one mile. There are some clean ups, I can just run your pr.

ccclyu · 2026-03-25T06:20:37Z

ok. thks so much! if you meet some issue when running solely on this PR, please let me know.

…ean up bridge - Remove redundant build_conversion_tasks override; use self.hf_config already set by the bridge dispatch system - Add toy-model conversion test under test_groups/models/bailing/ matching the MiniMax M2 / DeepSeek style - Add L0_Launch_models_bailing.sh for CI auto-discovery - Remove unused provider file and provider test Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

yaoyu-33 · 2026-03-25T06:35:51Z

i am doing a little more tests locally

yaoyu-33 · 2026-03-25T06:35:57Z

/claude review

…les, remove stale test - Fix bridge: set moe_router_score_function="sigmoid" (required when moe_router_enable_expert_bias=True, was causing ValueError on model init) - Add examples/models/bailing/ with conversion.sh, inference.sh, README.md for Ling-flash-2.0 (verified round-trip and inference on 8-GPU node) - Remove stale tests/functional_tests/models/bailing/ duplicate (the authoritative test is in test_groups/models/bailing/) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

yaoyu-33 · 2026-03-25T06:53:50Z

/ok to test 972f875

yaoyu-33 · 2026-03-25T06:55:48Z

/ok to test 972f875

yaoyu-33 · 2026-03-25T06:57:08Z

@ccclyu done updating, Feel free to review the new code heuristic for model additions.

ccclyu · 2026-03-30T08:32:35Z

@yaoyu-33 thanks and current code structure looks great! For the CI/CD run L0_Launch_models_bailing, the inclusionAI/Ling-mini-2.0 model is not precached in /home/TestData/HF_HOME so it failed. Could you please help trigger the workflow Cache HuggingFace Model for this model?

yaoyu-33 · 2026-04-09T02:08:19Z

/ok to test 5853894

… custom arch dispatch - Add BailingMoeV2Bridge, BailingMoeV2Config, BailingMoeV2ForCausalLM for Ling MoE2 models - Register bailing_moe_v2 with AutoConfig/AutoModelForCausalLM at import time so AutoConfig.from_pretrained works without hub access in offline CI environments - Fix _causal_lm_architecture in AutoBridge to fall back to class-name string when a custom arch (e.g. BailingMoeV2ForCausalLM) is not in standard transformers, enabling bridge dispatch for models registered via AutoConfig.register - Add expert_bias to IGNORE_PRECISION_PARAMS in roundtrip script: MoE gate expert bias is stored as float32 in Megatron but bfloat16 in HF - Add functional tests for TP/PP/EP conversion of toy BailingMoeV2 model Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-04-09T05:47:06Z

/ok to test 0d2c83a

Adapted vendor modeling files don't require docstrings on every class/function. Add modeling_*.py pattern to per-file-ignores in ruff.toml. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-04-09T16:36:31Z

/ok to test 564ef52

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-04-09T19:29:03Z

/ok to test 2c2fa1c

…s for custom model fallback Custom models registered via AutoConfig.register (e.g. BailingMoeV2ForCausalLM) are not in standard transformers but are valid — _causal_lm_architecture now returns the class name as a string for bridge dispatch instead of raising ValueError. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-04-09T20:43:39Z

/ok to test e0c1b30

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-04-09T21:40:57Z

/ok to test 500f9f8

ccclyu and others added 2 commits January 22, 2026 11:17

Merge branch 'NVIDIA-NeMo:main' into bailing_moe

11d1ebd

github-actions Bot added the community-request label Jan 22, 2026

ccclyu changed the title ~~Model Support for Ling Series MoE V2 Model~~ Model Support for Ling MoE V2 Model Jan 22, 2026

ccclyu changed the title ~~Model Support for Ling MoE V2 Model~~ [model] feat: Model Support for Ling MoE V2 Model Jan 23, 2026

add copy right and functional test cases

bba6e30

Signed-off-by: Changlong <changlyu@amazon.com>

ISEEKYAN previously approved these changes Feb 6, 2026

View reviewed changes

Merge branch 'main' into bailing_moe

b9d0b2e

coderabbitai Bot reviewed Feb 6, 2026

View reviewed changes

fix minor issues from coderabbit

dfa8bcd

Signed-off-by: Changlong <changlyu@amazon.com>

ccclyu dismissed ISEEKYAN’s stale review via dfa8bcd February 9, 2026 23:20

Merge branch 'main' into bailing_moe

d3f9cfc

copy-pr-bot Bot temporarily deployed to nemo-ci February 10, 2026 06:58 Inactive

ccclyu and others added 2 commits February 10, 2026 23:46

fix: lint format fix

1bcf127

Signed-off-by: Changlong <changlyu@amazon.com>

Merge branch 'main' into bailing_moe

c5d5e07

copy-pr-bot Bot temporarily deployed to nemo-ci February 11, 2026 02:17 Inactive

copy-pr-bot Bot temporarily deployed to test February 11, 2026 02:17 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci February 11, 2026 02:26 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci February 11, 2026 02:34 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci February 11, 2026 02:47 Inactive

Merge remote-tracking branch 'origin/main' into bailing_moe

459041f

Made-with: Cursor # Conflicts: # src/megatron/bridge/models/__init__.py

yaoyu-33 added 2 commits March 24, 2026 23:25

revert: Undo unintended 3rdparty/Megatron-LM submodule change

c214f8f

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

claude Bot reviewed Mar 25, 2026

View reviewed changes

Comment thread tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py Outdated

ccclyu added 3 commits March 30, 2026 01:34

Merge branch 'main' into bailing_moe

d0d8382

Merge branch 'main' into bailing_moe

5d933da

Merge branch 'main' into bailing_moe

5853894

yaoyu-33 previously approved these changes Apr 9, 2026

View reviewed changes

yaoyu-33 and others added 2 commits April 8, 2026 21:21

Merge remote-tracking branch 'origin/main' into bailing_moe

6180261

[build] fix: apply pre-commit formatting to modeling_bailing_moe_v2

2c2fa1c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

Merge remote-tracking branch 'origin/main' into bailing_moe

500f9f8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

cuichenx mentioned this pull request May 8, 2026

[NeMo FW 26.06 Release] MBridge v0.5.0 Roadmap #3754

Open

Conversation

ccclyu commented Jan 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot Bot commented Jan 22, 2026

Uh oh!

ccclyu commented Jan 23, 2026

Uh oh!

yaoyu-33 commented Jan 28, 2026

Uh oh!

ccclyu commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Feb 6, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

ISEEKYAN commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ccclyu commented Feb 9, 2026

Uh oh!

ISEEKYAN commented Feb 10, 2026

Uh oh!

ISEEKYAN commented Feb 11, 2026

Uh oh!

yaoyu-33 commented Mar 25, 2026

Uh oh!

ccclyu commented Mar 25, 2026

Uh oh!

yaoyu-33 commented Mar 25, 2026

Uh oh!

yaoyu-33 commented Mar 25, 2026

Uh oh!

Uh oh!

yaoyu-33 commented Mar 25, 2026

Uh oh!

yaoyu-33 commented Mar 25, 2026

Uh oh!

yaoyu-33 commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ccclyu commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaoyu-33 commented Apr 9, 2026

Uh oh!

yaoyu-33 commented Apr 9, 2026

Uh oh!

yaoyu-33 commented Apr 9, 2026

Uh oh!

yaoyu-33 commented Apr 9, 2026

Uh oh!

ccclyu commented Jan 22, 2026 •

edited by coderabbitai Bot

Loading

ccclyu commented Jan 29, 2026 •

edited

Loading

ISEEKYAN commented Feb 6, 2026 •

edited

Loading

yaoyu-33 commented Mar 25, 2026 •

edited

Loading

ccclyu commented Mar 30, 2026 •

edited

Loading