Skip to content

[model] feat: Model Support for Ling MoE V2 Model#2028

Merged
yaoyu-33 merged 26 commits into
NVIDIA-NeMo:mainfrom
ccclyu:bailing_moe
Apr 10, 2026
Merged

[model] feat: Model Support for Ling MoE V2 Model#2028
yaoyu-33 merged 26 commits into
NVIDIA-NeMo:mainfrom
ccclyu:bailing_moe

Conversation

@ccclyu

@ccclyu ccclyu commented Jan 22, 2026

Copy link
Copy Markdown
Contributor

What does this PR do ?

Support Ling MoE V2 Model (Ling-mini, Ling-flash, Ling-1T) in https://huggingface.co/collections/inclusionAI/ling-v2

Changelog

Following the instruction of https://docs.nvidia.com/nemo/megatron-bridge/latest/adding-new-models.html, add the model support for Ling MoE V2.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for Bailing MoE V2 models including Ling1T, LingFlash2, and LingMini2 variants.
    • Models now support conversion workflows with multiple parallelism configurations (tensor, pipeline, and expert parallelism).
  • Tests

    • Added functional tests for Bailing MoE V2 model conversion and provider configuration validation.

ccclyu and others added 2 commits January 22, 2026 11:17
This commit adds support for Bailing MoE V2 model including:

- Initial implementation of bailing_moe2_bridge and bailing_moe2_provider

- Fix layer spec configuration

- Fix MoE config and param dtype handling

- Fix MTP bug and QKV mapping

- Fix linting issues

Signed-off-by: Changlong <changlyu@amazon.com>
@copy-pr-bot

copy-pr-bot Bot commented Jan 22, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ccclyu ccclyu changed the title Model Support for Ling Series MoE V2 Model Model Support for Ling MoE V2 Model Jan 22, 2026
@ccclyu ccclyu changed the title Model Support for Ling MoE V2 Model [model] feat: Model Support for Ling MoE V2 Model Jan 23, 2026
@ccclyu

ccclyu commented Jan 23, 2026

Copy link
Copy Markdown
Contributor Author

will add docs and tests soon.

@yaoyu-33

Copy link
Copy Markdown
Contributor

Hi @ccclyu thanks for contributing. did you verify this on any downstream RL workflow already?

@ccclyu

ccclyu commented Jan 29, 2026

Copy link
Copy Markdown
Contributor Author

@ahxt has run SFT experiment on Ling-Mini-Base 2.0 using slime megatron trainer and the training curve/grad norm look fine. We also tested on trained models. In terms of SFT, the Megatron support looks good but we do not have chance to try RL.

image

Signed-off-by: Changlong <changlyu@amazon.com>
ISEEKYAN
ISEEKYAN previously approved these changes Feb 6, 2026
@coderabbitai

coderabbitai Bot commented Feb 6, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR introduces support for Bailing MoE v2 models and three Ling model variants (LingMini2, LingFlash2, Ling1T) through a new bridge implementation that converts HuggingFace models to Megatron GPTModel format, complete with MoE-specific parameter mappings, provider configurations, and comprehensive functional test coverage.

Changes

Cohort / File(s) Summary
Package API Extensions
src/megatron/bridge/models/__init__.py, src/megatron/bridge/models/bailing/__init__.py
Added imports and exports for BailingMoeV2Bridge, BailingMoeV2ModelProvider, and three Ling model provider classes (Ling1TModelProvider, LingFlash2ModelProvider, LingMini2ModelProvider) to expose them as part of the public API surface.
Bridge Implementation
src/megatron/bridge/models/bailing/bailing_moe2_bridge.py
Introduced BailingMoeV2Bridge class bridging BailingMoeV2ForCausalLM to GPTModel. Includes provider creation logic, parameter mapping registry with specialized handlers (ConcatenatedQKVMapping, GatedMLPMapping), and optional MTP support with dynamic per-layer mappings.
Model Providers
src/megatron/bridge/models/bailing/bailing_moe2_provider.py
Added base BailingMoeV2ModelProvider and three cascading Ling-specific providers with configurable MoE geometry (layer counts, hidden sizes, expert counts, MoE frequencies, intermediate sizes) and MTP parameters.
Conversion Functional Tests
tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py
New test module validating BailingMoeV2 conversion across parallelism configurations. Includes toy model fixture with fallback logic for custom model registration, toy model structure validation, and multi-GPU conversion testing via distributed run.
Provider Equivalence Tests
tests/functional_tests/models/bailing/test_bailing_moe2_provider.py
Added parameterized test comparing bridge-generated provider configs against predefined Ling providers using AutoBridge and configuration comparison utilities.

Sequence Diagram(s)

sequenceDiagram
    participant HF as HuggingFace Model
    participant Bridge as BailingMoeV2Bridge
    participant Provider as BailingMoeV2ModelProvider
    participant Tasks as Conversion Tasks
    participant Registry as Mapping Registry
    participant Megatron as Megatron GPTModel

    HF->>Bridge: provider_bridge(hf_pretrained)
    activate Bridge
    Bridge->>Provider: Create with HF config<br/>(dims, heads, MoE settings)
    activate Provider
    Provider-->>Bridge: BailingMoeV2ModelProvider
    deactivate Provider
    Bridge-->>HF: Return provider
    deactivate Bridge

    HF->>Bridge: build_conversion_tasks(hf_pretrained,<br/>megatron_model)
    activate Bridge
    Bridge->>Tasks: Initialize with HF config
    activate Tasks
    Tasks-->>Bridge: Conversion tasks ready
    deactivate Tasks
    Bridge-->>HF: Tasks configured
    deactivate Bridge

    HF->>Bridge: mapping_registry()
    activate Bridge
    Bridge->>Registry: Build parameter mappings<br/>(embeddings, attention, MLP,<br/>ConcatenatedQKV, GatedMLP)
    activate Registry
    alt Has MTP Support
        Registry->>Registry: Add per-layer MTP mappings<br/>(transformer_layer, attention, MLP)
    end
    Registry-->>Bridge: MegatronMappingRegistry
    deactivate Registry
    Bridge-->>HF: Complete registry
    deactivate Bridge

    HF->>Megatron: Convert via tasks & registry
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

Run CICD

Suggested reviewers

  • yaoyu-33
  • cuichenx
  • ananthsub
🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR adds Ling MoE V2 support with new functional tests but lacks formal documentation of test results, convergence validation, and performance metrics in the PR description. Document test execution results, quantitative SFT metrics, RL validation status, and confirmation of no numeric regressions in PR description.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[model] feat: Model Support for Ling MoE V2 Model' clearly describes the main change: adding support for Ling MoE V2 models. It directly aligns with the substantial additions of bridge implementations, provider classes, and test coverage for these models.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Fix all issues with AI agents
In `@src/megatron/bridge/models/__init__.py`:
- Around line 16-22: Add the missing export for BailingMoeV2Bridge to the
module's public API: update the __all__ list in
src/megatron/bridge/models/__init__.py to include "BailingMoeV2Bridge"
(alongside the already-exported GLM45Bridge, OlMoEBridge, NemotronBridge, and
the provider names Ling1TModelProvider, LingFlash2ModelProvider,
LingMini2ModelProvider) so the imported symbol is actually exported and the
Flake8 F401 is resolved.

In `@src/megatron/bridge/models/bailing/bailing_moe2_bridge.py`:
- Around line 56-62: Fix the typo in the module docstring: change "AutoBrige" to
"AutoBridge" in the example block so the class name matches the actual symbol;
update the docstring in bailing_moe2_bridge.py (the example that references
AutoBrige -> AutoBridge) to ensure the usage snippet correctly references
AutoBridge.

In `@src/megatron/bridge/models/bailing/bailing_moe2_provider.py`:
- Line 58: The variable init_method_std is annotated as int but assigned 0.02;
update its type hint to float (e.g., change init_method_std: int = 0.02 to
init_method_std: float = 0.02) in the BailingMoe2Provider (or the scope where
init_method_std is defined) and run a quick search for other uses or annotations
referencing init_method_std to ensure they expect a float and adjust any
docstrings or tests accordingly.
- Around line 59-73: Remove the duplicate attention_dropout field in the
dataclass: locate both occurrences of the attention_dropout attribute (the one
near the top with other config fields and the second one just above kv_channels)
and delete one so the dataclass only declares attention_dropout once (retain the
intended value, e.g., 0.0). Ensure there are no other duplicate config fields
introduced nearby (e.g., kv_channels) and run a quick lint/type-check to confirm
no references rely on the removed duplicate.

In `@tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py`:
- Around line 160-164: The test currently overwrites the config.json written by
model.save_pretrained(), losing model-generated metadata like auto_map and the
serialized torch_dtype; instead, load the file that model.save_pretrained()
produced (config_path), parse it, merge in only the missing/desired fields from
HF_BAILING_MOE2_TOY_MODEL_CONFIG (or skip the overwrite entirely), and write the
merged config back so fields such as auto_map and the model's serialized
torch_dtype remain intact; reference model.save_pretrained, config_to_save,
HF_BAILING_MOE2_TOY_MODEL_CONFIG and config_path to locate and change the logic.
- Around line 130-133: Remove the debug print in the loop that inspects the
model dtype (the for name, param in model.named_parameters(): print(...) break
block); either delete the print entirely or replace it with the standardized
non-duplicating logger (use print_rank_0 or logging.debug) so tests don't emit
raw prints across ranks—locate the snippet referencing model.named_parameters()
in the test and swap the print for print_rank_0(f"Before save - {name}:
{param.dtype}") or remove the block.
- Around line 108-126: The fallback in the except block currently calls
AutoModelForCausalLM.from_pretrained("inclusionAI/Ling-mini-2.0", ...) which
downloads a huge model to register the class; replace that heavy fallback by
either calling AutoConfig.from_pretrained("inclusionAI/Ling-mini-2.0",
trust_remote_code=True) to fetch only the config and remote code (so
AutoModelForCausalLM.from_config(config, trust_remote_code=True) can succeed) or
skip the test immediately with pytest.skip if class registration fails; update
the block around AutoModelForCausalLM.from_pretrained,
AutoModelForCausalLM.from_config, and pytest.skip to use
AutoConfig.from_pretrained or direct pytest.skip instead of downloading the full
model.
- Around line 311-314: The test currently hardcodes /opt/Megatron-Bridge/ in the
coverage args which breaks non-CI environments; update the arguments to derive
the paths from the repo root already computed as cwd (or from Path(__file__)),
e.g. replace "--data-file=/opt/Megatron-Bridge/.coverage" with "--data-file=" +
str(cwd / ".coverage") and replace "--source=/opt/Megatron-Bridge/" with
"--source=" + str(cwd) so the coverage data file and source path are computed
dynamically (refer to the cwd variable computed in this test file).
🧹 Nitpick comments (8)
src/megatron/bridge/models/bailing/__init__.py (1)

25-31: __all__ is not sorted alphabetically.

Ruff flags RUF022. Sorting __all__ keeps the public API surface consistent and easier to maintain.

♻️ Suggested sort
 __all__ = [
     "BailingMoeV2Bridge",
     "BailingMoeV2ModelProvider",
-    "LingMini2ModelProvider",
-    "LingFlash2ModelProvider",
     "Ling1TModelProvider",
+    "LingFlash2ModelProvider",
+    "LingMini2ModelProvider",
 ]
src/megatron/bridge/models/bailing/bailing_moe2_provider.py (2)

78-78: Commented-out code without explanation.

As per coding guidelines, commented-out code should include a comment describing its usage and why it is commented out; otherwise it should be removed before merging. Consider adding a brief rationale or removing this line.


28-33: Unused noqa directive on line 29.

Ruff flags RUF100: the # noqa: F401 is unnecessary here since the import is used (to set HAVE_TE). Remove the stale directive.

♻️ Cleanup
-    import transformer_engine  # type: ignore  # noqa: F401
+    import transformer_engine  # type: ignore
src/megatron/bridge/models/bailing/bailing_moe2_bridge.py (3)

90-94: Missing type hints on build_conversion_tasks.

Per coding guidelines, function arguments and return types should have type hints. The override should match the parent signature.

♻️ Add type hints
-    def build_conversion_tasks(self, hf_pretrained, megatron_model):
+    def build_conversion_tasks(self, hf_pretrained: PreTrainedCausalLM, megatron_model: GPTModel):

154-170: MTP megatron-param rewriting with chained str.replace is subtle and fragile.

The logic assumes ".*" appears exactly once in every layer_specific_mappings key. Today that holds, but a future mapping with an additional wildcard (e.g., decoder.layers.*.mlp.experts.*.some_param) would be silently corrupted because str.replace is global.

Consider using str.replace(old, new, count=1) for each step to make the assumption explicit, or refactor to a regex/split-based approach.

♻️ Safer minimal change
             megatron_param = (
-                megatron_param.replace(".*", ".*.transformer_layer")
-                .replace("decoder", "mtp")
-                .replace(".*", f".{mtp_layer}")
+                megatron_param.replace(".*", ".*.transformer_layer", 1)
+                .replace("decoder", "mtp", 1)
+                .replace(".*", f".{mtp_layer}", 1)
             )

96-218: mapping_registry lacks a return type docstring explaining the mapping structure.

This method builds a complex registry with conditional MTP paths and two-to-one HF→Megatron mappings (e.g., both input_layernorm and linear_qkv.layer_norm_weight map to the same HF key). A brief docstring covering the overall strategy would help future maintainers.

tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py (2)

275-275: Use pytest.fail() instead of assert False.

assert False statements are stripped when Python runs with -O (optimized). Use pytest.fail(msg) which is unconditional and idiomatic in pytest.

♻️ Fix both occurrences

Line 275:

-                assert False, f"Failed to load created toy MoE model: {e}"
+                pytest.fail(f"Failed to load created toy MoE model: {e}")

Line 339:

-                assert False, f"Bailing MoE V2 {test_name} conversion failed with return code {result.returncode}"
+                pytest.fail(f"Bailing MoE V2 {test_name} conversion failed with return code {result.returncode}")

Also applies to: 339-339


277-285: GPU test lacks hardware requirement documentation.

Per coding guidelines: "Document hardware requirements for GPU tests." The parameterized tests each require nproc_per_node=2, so at minimum 2 GPUs are needed. Add a brief comment or docstring noting this.

Comment thread src/megatron/bridge/models/__init__.py
Comment thread src/megatron/bridge/models/bailing/bailing_moe2_bridge.py
Comment thread src/megatron/bridge/models/bailing/bailing_moe2_provider.py Outdated
Comment thread src/megatron/bridge/models/bailing/bailing_moe2_provider.py Outdated
Comment on lines +108 to +126
except Exception as e:
# If that fails, try loading a minimal model to register the class
try:
# Load a tiny model just to register the class
_ = AutoModelForCausalLM.from_pretrained(
"inclusionAI/Ling-mini-2.0",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="cpu", # Don't use GPU for this
)
# Now try again
model = AutoModelForCausalLM.from_config(
config, trust_remote_code=True
)
except Exception as e2:
pytest.skip(
f"Could not create Bailing MoE V2 model: {e}. "
f"Fallback also failed: {e2}. Model class may require custom code."
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inner fallback downloads the full Ling-mini-2.0 model (~16B params) just to register the class.

If from_config fails, the fallback on lines 112-117 calls from_pretrained("inclusionAI/Ling-mini-2.0"), which downloads the entire multi-billion parameter model to CPU. This makes the test extremely slow and resource-intensive, and can cause OOM on CI machines. Consider using AutoConfig.from_pretrained(..., trust_remote_code=True) (which only downloads config + code files) to register the class, or calling pytest.skip directly instead of this heavy fallback.

🧰 Tools
🪛 Ruff (0.14.14)

[warning] 108-108: Do not catch blind exception: Exception

(BLE001)


[warning] 122-122: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In `@tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py` around
lines 108 - 126, The fallback in the except block currently calls
AutoModelForCausalLM.from_pretrained("inclusionAI/Ling-mini-2.0", ...) which
downloads a huge model to register the class; replace that heavy fallback by
either calling AutoConfig.from_pretrained("inclusionAI/Ling-mini-2.0",
trust_remote_code=True) to fetch only the config and remote code (so
AutoModelForCausalLM.from_config(config, trust_remote_code=True) can succeed) or
skip the test immediately with pytest.skip if class registration fails; update
the block around AutoModelForCausalLM.from_pretrained,
AutoModelForCausalLM.from_config, and pytest.skip to use
AutoConfig.from_pretrained or direct pytest.skip instead of downloading the full
model.

Comment on lines +311 to +314
"coverage",
"run",
"--data-file=/opt/Megatron-Bridge/.coverage",
"--source=/opt/Megatron-Bridge/",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for hardcoded /opt/Megatron-Bridge paths in test files
rg -n '/opt/Megatron-Bridge' --type=py -g 'tests/**'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 19996


🏁 Script executed:

# Examine the specific test file to understand context
wc -l tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 144


🏁 Script executed:

# Look at lines around 311-332 to see the context, especially the cwd argument mentioned on line 332
sed -n '305,335p' tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1063


Hardcoded /opt/Megatron-Bridge/ paths in coverage configuration are CI-specific and will fail in other environments.

The --data-file and --source arguments assume the repo is deployed at /opt/Megatron-Bridge/. This pattern appears across many test files and should be refactored to derive paths from the repo root (using cwd already computed on line 332 or Path(__file__)).

🤖 Prompt for AI Agents
In `@tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py` around
lines 311 - 314, The test currently hardcodes /opt/Megatron-Bridge/ in the
coverage args which breaks non-CI environments; update the arguments to derive
the paths from the repo root already computed as cwd (or from Path(__file__)),
e.g. replace "--data-file=/opt/Megatron-Bridge/.coverage" with "--data-file=" +
str(cwd / ".coverage") and replace "--source=/opt/Megatron-Bridge/" with
"--source=" + str(cwd) so the coverage data file and source path are computed
dynamically (refer to the cwd variable computed in this test file).

@ISEEKYAN

ISEEKYAN commented Feb 6, 2026

Copy link
Copy Markdown
Contributor

/ok to test b9d0b2e

Signed-off-by: Changlong <changlyu@amazon.com>
@ccclyu

ccclyu commented Feb 9, 2026

Copy link
Copy Markdown
Contributor Author

@ISEEKYAN @yaoyu-33 fixed the minor issued raised by coderabbit. can you review again and trigger the ci/cd? Thanks so much!

@ISEEKYAN

Copy link
Copy Markdown
Contributor

/ok to test d3f9cfc

ccclyu and others added 2 commits February 10, 2026 23:46
Signed-off-by: Changlong <changlyu@amazon.com>
@ISEEKYAN

Copy link
Copy Markdown
Contributor

/ok to test c5d5e07

Made-with: Cursor

# Conflicts:
#	src/megatron/bridge/models/__init__.py
@yaoyu-33

Copy link
Copy Markdown
Contributor

it's okay, we can finish the last one mile. There are some clean ups, I can just run your pr.

@ccclyu

ccclyu commented Mar 25, 2026

Copy link
Copy Markdown
Contributor Author

ok. thks so much! if you meet some issue when running solely on this PR, please let me know.

…ean up bridge

- Remove redundant build_conversion_tasks override; use self.hf_config
  already set by the bridge dispatch system
- Add toy-model conversion test under test_groups/models/bailing/
  matching the MiniMax M2 / DeepSeek style
- Add L0_Launch_models_bailing.sh for CI auto-discovery
- Remove unused provider file and provider test

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
@yaoyu-33

Copy link
Copy Markdown
Contributor

i am doing a little more tests locally

@yaoyu-33

Copy link
Copy Markdown
Contributor

/claude review

Comment thread tests/functional_tests/models/bailing/test_bailing_moe2_conversion.py Outdated
…les, remove stale test

- Fix bridge: set moe_router_score_function="sigmoid" (required when
  moe_router_enable_expert_bias=True, was causing ValueError on model init)
- Add examples/models/bailing/ with conversion.sh, inference.sh, README.md
  for Ling-flash-2.0 (verified round-trip and inference on 8-GPU node)
- Remove stale tests/functional_tests/models/bailing/ duplicate (the
  authoritative test is in test_groups/models/bailing/)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Made-with: Cursor
@yaoyu-33

Copy link
Copy Markdown
Contributor

/ok to test 972f875

1 similar comment
@yaoyu-33

Copy link
Copy Markdown
Contributor

/ok to test 972f875

@yaoyu-33

yaoyu-33 commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

@ccclyu done updating, Feel free to review the new code heuristic for model additions.

@ccclyu

ccclyu commented Mar 30, 2026

Copy link
Copy Markdown
Contributor Author

@yaoyu-33 thanks and current code structure looks great! For the CI/CD run L0_Launch_models_bailing, the inclusionAI/Ling-mini-2.0 model is not precached in /home/TestData/HF_HOME so it failed. Could you please help trigger the workflow Cache HuggingFace Model for this model?

yaoyu-33
yaoyu-33 previously approved these changes Apr 9, 2026
@yaoyu-33

yaoyu-33 commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

/ok to test 5853894

yaoyu-33 and others added 2 commits April 8, 2026 21:21
… custom arch dispatch

- Add BailingMoeV2Bridge, BailingMoeV2Config, BailingMoeV2ForCausalLM for Ling MoE2 models
- Register bailing_moe_v2 with AutoConfig/AutoModelForCausalLM at import time so
  AutoConfig.from_pretrained works without hub access in offline CI environments
- Fix _causal_lm_architecture in AutoBridge to fall back to class-name string when
  a custom arch (e.g. BailingMoeV2ForCausalLM) is not in standard transformers,
  enabling bridge dispatch for models registered via AutoConfig.register
- Add expert_bias to IGNORE_PRECISION_PARAMS in roundtrip script: MoE gate expert
  bias is stored as float32 in Megatron but bfloat16 in HF
- Add functional tests for TP/PP/EP conversion of toy BailingMoeV2 model

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

yaoyu-33 commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

/ok to test 0d2c83a

Adapted vendor modeling files don't require docstrings on every class/function.
Add modeling_*.py pattern to per-file-ignores in ruff.toml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

yaoyu-33 commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

/ok to test 564ef52

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

yaoyu-33 commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

/ok to test 2c2fa1c

…s for custom model fallback

Custom models registered via AutoConfig.register (e.g. BailingMoeV2ForCausalLM) are not
in standard transformers but are valid — _causal_lm_architecture now returns the class name
as a string for bridge dispatch instead of raising ValueError.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

yaoyu-33 commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

/ok to test e0c1b30

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

yaoyu-33 commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

/ok to test 500f9f8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request ready-to-merge PR is approved, current, and only waiting for CI to pass before merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants