Skip to content

[model] feat: add Qwen3.5 text model bridges (dense + MoE)#3769

Merged
cuichenx merged 5 commits into
NVIDIA-NeMo:mainfrom
HowardZorn:qwen3_5_lm
Jun 1, 2026
Merged

[model] feat: add Qwen3.5 text model bridges (dense + MoE)#3769
cuichenx merged 5 commits into
NVIDIA-NeMo:mainfrom
HowardZorn:qwen3_5_lm

Conversation

@HowardZorn

@HowardZorn HowardZorn commented May 10, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Add Megatron Bridge support for Qwen3.5 language models (Qwen3.5-LM), which are the text-only component extracted from Qwen3.5-VL. The model architecture is similar to Qwen3-Next (hybrid GDN + standard attention), but the HF checkpoint organization follows the Qwen3.5-VL convention (model.language_model.* prefix instead of model.layers.*). Additionally, the MTP mcore path is renamed from transformer_layer to mtp_model_layer to align with Megatron-Core's actual module naming.

Changelog

  • Add Qwen3_5Bridge for dense Qwen3.5 language models (qwen3_5_bridge.py)
  • Add Qwen3_5MoEBridge for MoE variant of Qwen3.5 language models (qwen3_5_moe_bridge.py)
  • Register both bridges in models/qwen/__init__.py with model types qwen3_5_text and qwen3_5_moe_text
  • Fix MTP parameter paths in qwen3_bridge.py: mtp.layers.*.transformer_layer.*mtp.layers.*.mtp_model_layer.*
  • Fix MTP parameter paths in qwen3_next_bridge.py: same rename as above (13 mappings updated)
  • Add comprehensive unit tests for both bridges (test_qwen35_bridge.py)

Second commit changelog

  • Rename Qwen3_5Bridge → Qwen35Bridge and Qwen3_5MoEBridgeQwen35MoEBridge
    to follow project naming conventions.
  • Merge dense and MoE text bridges into a single qwen35_bridge.py module with
    extracted helper functions and static mapping methods that are consumed by both text and VL bridges, eliminating code duplication.
  • Fix incorrect HF param paths in dense LM mappings (removed spurious language_model nesting).
  • Remove transpose_on_export from Qwen3.5 bridge to fix shape mismatch.
  • Add functional tests for dense and MoE bridges.

Key design decisions

Aspect Qwen3.5-LM Qwen3-Next (reference)
HF param prefix model.language_model.* model.layers.*
Decoder expert format Fused (gate_up_proj) Individual (gate_proj/up_proj)
MTP expert format Individual per-expert Individual per-expert
GDN linear mapping GDNLinearMappingSeparate (4-tensor: qkv/z/b/a) GDNLinearMapping (2-tensor: qkvz/ba)

GitHub Actions CI

See the CI section in the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests? — Unit tests for both dense and MoE bridges added
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? — No, uses existing dependencies only

Additional Information

Acknowledgements

From Engine Architecture Group 5, Engine Infrastructure Department, Xiaohongshu (RedNote).

Note to maintainers: please include the following trailers when squashing:

Co-authored-by: HuayiJin <27394199+HuayiJin@users.noreply.github.com>
Co-authored-by: whitelok <whitelok@163.com>

@copy-pr-bot

copy-pr-bot Bot commented May 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33 yaoyu-33 added area:model Model implementations and HF bridge logic feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer labels May 11, 2026
Comment thread src/megatron/bridge/models/qwen/qwen3_5_bridge.py Outdated
Comment thread src/megatron/bridge/models/qwen/qwen3_5_bridge.py Outdated
@cuichenx

Copy link
Copy Markdown
Contributor

can we also add a functional test in addition to the unit test?

@svcnvidia-nemo-ci svcnvidia-nemo-ci added the waiting-on-customer Waiting on the original author to respond label May 12, 2026
@yaoyu-33 yaoyu-33 removed the needs-review PR is ready for code review and waiting on a reviewer label May 12, 2026
@yaoyu-33

Copy link
Copy Markdown
Contributor

/claude review

Comment thread src/megatron/bridge/models/qwen/qwen3_5_bridge.py Outdated
@claude

claude Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Code Review — Qwen3.5 Text Model Bridges

Overall this is a clean contribution. The new bridges follow established patterns from Qwen3-Next, the MTP path rename (transformer_layer to mtp_model_layer) is applied consistently across all three bridge files, and the test coverage is solid.

Issues

  1. getattr with hardcoded defaults for GDN config fields (inline comment) — qwen3_5_bridge.py:91-95 and qwen3_5_moe_bridge.py:106-110 use getattr(hf_config, ..., default) for linear_conv_kernel_dim, linear_key_head_dim, etc. The Qwen3-Next bridge accesses these directly from hf_config, which is more explicit and fails loudly on misconfigured models. Silent fallback to a wrong head dimension would produce a subtly broken model.

  2. MTP-enabled path is untested — Both bridges have conditional mtp_loss_scaling_factor = 0.1 when mtp_num_layers is set, but all tests set mtp_num_layers to None. Consider adding a test variant where num_nextn_predict_layers (or the relevant config field) is set to 1, and asserting result.mtp_loss_scaling_factor == 0.1. (This gap also exists in the Qwen3 tests, but worth addressing here since the new tests are being written fresh.)

Suggested test cases

No perf tests impacted.

@cuichenx

Copy link
Copy Markdown
Contributor

@HowardZorn Can you let us know if you're still working on this PR or would like us to take over? Thanks

@HowardZorn

Copy link
Copy Markdown
Contributor Author

@HowardZorn Can you let us know if you're still working on this PR or would like us to take over? Thanks

Sorry for the wait. We spent last week conducting testing and a major refactor. We're now submitting a significantly improved version of the Qwen3.5-LM bridge for review.

@yaoyu-33

Copy link
Copy Markdown
Contributor

@HowardZorn thanks for the effort. please resolve conflicts if you got time.

@cuichenx can you review again

@HowardZorn HowardZorn force-pushed the qwen3_5_lm branch 2 times, most recently from f0068f3 to 2560bf8 Compare May 22, 2026 07:09
@HowardZorn

Copy link
Copy Markdown
Contributor Author

@HowardZorn thanks for the effort. please resolve conflicts if you got time.

Thanks for the reminder, I just rebased onto the main branch. And I am doing tests on my local machine.

@yaoyu-33

Copy link
Copy Markdown
Contributor

/ok to test 2560bf8

HowardZorn and others added 2 commits May 27, 2026 16:58
… MTP mcore path

Add Qwen3_5Bridge and Qwen3_5MoEBridge for HF ↔ Megatron-Core
weight conversion of Qwen3.5 language models with hybrid GDN+Attention
architecture. Also align MTP Megatron-Core parameter paths from
transformer_layer to mtp_model_layer in Qwen3 and Qwen3-Next bridges.

Signed-off-by: He Ruozhou <heruozhou@xiaohongshu.com>
…unctional tests

 - Rename Qwen3_5Bridge → Qwen35Bridge and Qwen3_5MoEBridge → Qwen35MoEBridge
to follow project naming conventions.
 - Merge dense and MoE text bridges into a single qwen35_bridge.py module with
extracted helper functions (_apply_qwen35_common_config, _apply_qwen35_moe_config)
and static mapping methods (_get_dense_lm_mappings, _get_moe_lm_mappings, etc.)
that are consumed by both text and VL bridges, eliminating significant code
duplication.
 - Fix incorrect HF param paths in dense LM mappings (removed spurious
language_model nesting).
 - Remove `transpose_on_export` from Qwen3.5 bridge to fix shape mismatch.
 - Add functional tests covering single-GPU roundtrip, multi-GPU parallelism,
and autoconfig conversion for both dense and MoE variants.
 - Format with ruff linter.

Co-authored-by: HuayiJin <27394199+HuayiJin@users.noreply.github.com>
Co-authored-by: whitelok <whitelok@163.com>
Signed-off-by: He Ruozhou <heruozhou@xiaohongshu.com>
@cuichenx

Copy link
Copy Markdown
Contributor

/ok to test 9beec42

@HowardZorn

Copy link
Copy Markdown
Contributor Author

It seems bc8ace9 fixed this outdated URL issues in docs. Should I rebase my branch onto the main branch?

@cuichenx

Copy link
Copy Markdown
Contributor

/ok to test c74a2d7

@cuichenx

Copy link
Copy Markdown
Contributor

/ok to test 2f9c90a

@cuichenx cuichenx added the needs-more-tests Requires additional L0 and L1 test coverage before merge label May 29, 2026
@cuichenx

Copy link
Copy Markdown
Contributor

/ok to test c67e020

@cuichenx cuichenx merged commit ab1d5e4 into NVIDIA-NeMo:main Jun 1, 2026
96 of 103 checks passed
@cuichenx cuichenx linked an issue Jun 3, 2026 that may be closed by this pull request
vasunvidia pushed a commit to vasunvidia/Megatron-Bridge that referenced this pull request Jun 10, 2026
…Mo#3769)

Signed-off-by: He Ruozhou <heruozhou@xiaohongshu.com>
Co-authored-by: HuayiJin <27394199+HuayiJin@users.noreply.github.com>
Co-authored-by: whitelok <whitelok@163.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:model Model implementations and HF bridge logic community-request feature New capabilities, enhancements, or enablement work needs-more-tests Requires additional L0 and L1 test coverage before merge ready-to-merge PR is approved, current, and only waiting for CI to pass before merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[support] Training QwenX models without the vision block

4 participants