[model] feat: add Qwen3.5 text model bridges (dense + MoE) by HowardZorn · Pull Request #3769 · NVIDIA-NeMo/Megatron-Bridge

HowardZorn · 2026-05-10T11:40:38Z

What does this PR do?

Add Megatron Bridge support for Qwen3.5 language models (Qwen3.5-LM), which are the text-only component extracted from Qwen3.5-VL. The model architecture is similar to Qwen3-Next (hybrid GDN + standard attention)~~, but the HF checkpoint organization follows the Qwen3.5-VL convention (model.language_model.* prefix instead of model.layers.*)~~. Additionally, the MTP mcore path is renamed from transformer_layer to mtp_model_layer to align with Megatron-Core's actual module naming.

Changelog

Add Qwen3_5Bridge for dense Qwen3.5 language models (qwen3_5_bridge.py)
Add Qwen3_5MoEBridge for MoE variant of Qwen3.5 language models (qwen3_5_moe_bridge.py)
Register both bridges in models/qwen/__init__.py with model types qwen3_5_text and qwen3_5_moe_text
Fix MTP parameter paths in qwen3_bridge.py: mtp.layers.*.transformer_layer.* → mtp.layers.*.mtp_model_layer.*
Fix MTP parameter paths in qwen3_next_bridge.py: same rename as above (13 mappings updated)
Add comprehensive unit tests for both bridges (test_qwen35_bridge.py)

Second commit changelog

Rename Qwen3_5Bridge → Qwen35Bridge and Qwen3_5MoEBridge → Qwen35MoEBridge
to follow project naming conventions.
Merge dense and MoE text bridges into a single qwen35_bridge.py module with
extracted helper functions and static mapping methods that are consumed by both text and VL bridges, eliminating code duplication.
Fix incorrect HF param paths in dense LM mappings (removed spurious language_model nesting).
Remove transpose_on_export from Qwen3.5 bridge to fix shape mismatch.
Add functional tests for dense and MoE bridges.

Key design decisions

Aspect	Qwen3.5-LM	Qwen3-Next (reference)
HF param prefix	`model.language_model.*`	`model.layers.*`
Decoder expert format	Fused (`gate_up_proj`)	Individual (`gate_proj`/`up_proj`)
MTP expert format	Individual per-expert	Individual per-expert
GDN linear mapping	`GDNLinearMappingSeparate` (4-tensor: qkv/z/b/a)	`GDNLinearMapping` (2-tensor: qkvz/ba)

GitHub Actions CI

See the CI section in the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests? — Unit tests for both dense and MoE bridges added
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? — No, uses existing dependencies only

Additional Information

Qwen3.5-LM is the standalone text model extracted from Qwen3.5-VL, sharing the same hybrid GDN+Attention architecture as Qwen3-Next ~~but with VL-style checkpoint organization (This is because Hugging Face transformers do not accept other state dict organizations)~~.
Standalone Qwen3.5-LM is necessary for me and other guys. Please refer to [feature] Add qwen3.5 config + example for LLM only. #2973.
Potential third party bug that brakes Qwen3.5-VL unit tests: (Qwen3_5MoeVisionConfig missing deepstack_visual_indexes field — silently dropped by @strict huggingface/transformers#45375).

Acknowledgements

From Engine Architecture Group 5, Engine Infrastructure Department, Xiaohongshu (RedNote).

Note to maintainers: please include the following trailers when squashing:
Co-authored-by: HuayiJin <27394199+HuayiJin@users.noreply.github.com>
Co-authored-by: whitelok <whitelok@163.com>

copy-pr-bot · 2026-05-10T11:40:43Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cuichenx · 2026-05-11T22:03:59Z

can we also add a functional test in addition to the unit test?

yaoyu-33 · 2026-05-20T22:30:28Z

/claude review

claude · 2026-05-20T22:36:08Z

Code Review — Qwen3.5 Text Model Bridges

Overall this is a clean contribution. The new bridges follow established patterns from Qwen3-Next, the MTP path rename (transformer_layer to mtp_model_layer) is applied consistently across all three bridge files, and the test coverage is solid.

Issues

getattr with hardcoded defaults for GDN config fields (inline comment) — qwen3_5_bridge.py:91-95 and qwen3_5_moe_bridge.py:106-110 use getattr(hf_config, ..., default) for linear_conv_kernel_dim, linear_key_head_dim, etc. The Qwen3-Next bridge accesses these directly from hf_config, which is more explicit and fails loudly on misconfigured models. Silent fallback to a wrong head dimension would produce a subtly broken model.
MTP-enabled path is untested — Both bridges have conditional mtp_loss_scaling_factor = 0.1 when mtp_num_layers is set, but all tests set mtp_num_layers to None. Consider adding a test variant where num_nextn_predict_layers (or the relevant config field) is set to 1, and asserting result.mtp_loss_scaling_factor == 0.1. (This gap also exists in the Qwen3 tests, but worth addressing here since the new tests are being written fresh.)

Suggested test cases

No perf tests impacted.

cuichenx · 2026-05-21T20:29:17Z

@HowardZorn Can you let us know if you're still working on this PR or would like us to take over? Thanks

HowardZorn · 2026-05-22T02:53:00Z

@HowardZorn Can you let us know if you're still working on this PR or would like us to take over? Thanks

Sorry for the wait. We spent last week conducting testing and a major refactor. We're now submitting a significantly improved version of the Qwen3.5-LM bridge for review.

yaoyu-33 · 2026-05-22T04:38:04Z

@HowardZorn thanks for the effort. please resolve conflicts if you got time.

@cuichenx can you review again

HowardZorn · 2026-05-22T07:28:57Z

@HowardZorn thanks for the effort. please resolve conflicts if you got time.

Thanks for the reminder, I just rebased onto the main branch. And I am doing tests on my local machine.

yaoyu-33 · 2026-05-27T02:19:15Z

/ok to test 2560bf8

… MTP mcore path Add Qwen3_5Bridge and Qwen3_5MoEBridge for HF ↔ Megatron-Core weight conversion of Qwen3.5 language models with hybrid GDN+Attention architecture. Also align MTP Megatron-Core parameter paths from transformer_layer to mtp_model_layer in Qwen3 and Qwen3-Next bridges. Signed-off-by: He Ruozhou <heruozhou@xiaohongshu.com>

…unctional tests - Rename Qwen3_5Bridge → Qwen35Bridge and Qwen3_5MoEBridge → Qwen35MoEBridge to follow project naming conventions. - Merge dense and MoE text bridges into a single qwen35_bridge.py module with extracted helper functions (_apply_qwen35_common_config, _apply_qwen35_moe_config) and static mapping methods (_get_dense_lm_mappings, _get_moe_lm_mappings, etc.) that are consumed by both text and VL bridges, eliminating significant code duplication. - Fix incorrect HF param paths in dense LM mappings (removed spurious language_model nesting). - Remove `transpose_on_export` from Qwen3.5 bridge to fix shape mismatch. - Add functional tests covering single-GPU roundtrip, multi-GPU parallelism, and autoconfig conversion for both dense and MoE variants. - Format with ruff linter. Co-authored-by: HuayiJin <27394199+HuayiJin@users.noreply.github.com> Co-authored-by: whitelok <whitelok@163.com> Signed-off-by: He Ruozhou <heruozhou@xiaohongshu.com>

cuichenx · 2026-05-28T16:26:09Z

/ok to test 9beec42

HowardZorn · 2026-05-28T17:23:14Z

It seems bc8ace9 fixed this outdated URL issues in docs. Should I rebase my branch onto the main branch?

cuichenx · 2026-05-29T17:01:59Z

/ok to test c74a2d7

cuichenx · 2026-05-29T17:42:26Z

/ok to test 2f9c90a

cuichenx · 2026-05-29T21:07:46Z

/ok to test c67e020

…Mo#3769) Signed-off-by: He Ruozhou <heruozhou@xiaohongshu.com> Co-authored-by: HuayiJin <27394199+HuayiJin@users.noreply.github.com> Co-authored-by: whitelok <whitelok@163.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>

github-actions Bot added the community-request label May 10, 2026

yaoyu-33 added area:model Model implementations and HF bridge logic feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer labels May 11, 2026

cuichenx reviewed May 11, 2026

View reviewed changes

Comment thread src/megatron/bridge/models/qwen/qwen3_5_bridge.py Outdated

Comment thread src/megatron/bridge/models/qwen/qwen3_5_bridge.py Outdated

svcnvidia-nemo-ci added the waiting-on-customer Waiting on the original author to respond label May 12, 2026

yaoyu-33 removed the needs-review PR is ready for code review and waiting on a reviewer label May 12, 2026

cuichenx mentioned this pull request May 19, 2026

[support] Training QwenX models without the vision block #3891

Closed

claude Bot reviewed May 20, 2026

View reviewed changes

Comment thread src/megatron/bridge/models/qwen/qwen3_5_bridge.py Outdated

HowardZorn force-pushed the qwen3_5_lm branch from 31300ed to 2b49805 Compare May 22, 2026 02:41

HowardZorn force-pushed the qwen3_5_lm branch 2 times, most recently from f0068f3 to 2560bf8 Compare May 22, 2026 07:09

copy-pr-bot Bot temporarily deployed to public May 27, 2026 02:19 Inactive

copy-pr-bot Bot temporarily deployed to public May 27, 2026 02:52 Inactive

copy-pr-bot Bot temporarily deployed to public May 27, 2026 03:08 Inactive

HowardZorn and others added 2 commits May 27, 2026 16:58

HowardZorn force-pushed the qwen3_5_lm branch from 2560bf8 to 9beec42 Compare May 27, 2026 09:02

cuichenx mentioned this pull request May 28, 2026

[NeMo FW 26.06 Release] MBridge v0.5.0 Roadmap #3754

Open

cuichenx mentioned this pull request May 28, 2026

[recipe] feat: Add Qwen3.5 LLM-only SFT recipes with skip_megatron_pa… #3037

Closed

Merge branch 'main' into qwen3_5_lm

c74a2d7

copy-pr-bot Bot temporarily deployed to public May 29, 2026 17:02 Inactive

copy-pr-bot Bot temporarily deployed to test May 29, 2026 17:02 Inactive

copy-pr-bot Bot temporarily deployed to public May 29, 2026 17:30 Inactive

Merge branch 'main' into qwen3_5_lm

2f9c90a

copy-pr-bot Bot temporarily deployed to public May 29, 2026 17:42 Inactive

copy-pr-bot Bot temporarily deployed to test May 29, 2026 17:43 Inactive

copy-pr-bot Bot had a problem deploying to public May 29, 2026 18:18 Failure

cuichenx added the needs-more-tests Requires additional L0 and L1 test coverage before merge label May 29, 2026

Merge branch 'main' into qwen3_5_lm

c67e020

copy-pr-bot Bot temporarily deployed to public May 29, 2026 21:08 Inactive

copy-pr-bot Bot temporarily deployed to test May 29, 2026 21:08 Inactive

copy-pr-bot Bot had a problem deploying to public May 29, 2026 21:43 Failure

copy-pr-bot Bot had a problem deploying to public May 30, 2026 22:07 Failure

cuichenx merged commit ab1d5e4 into NVIDIA-NeMo:main Jun 1, 2026
96 of 103 checks passed

copy-pr-bot Bot temporarily deployed to public June 1, 2026 17:25 Inactive

copy-pr-bot Bot temporarily deployed to public June 1, 2026 17:45 Inactive

cuichenx linked an issue Jun 3, 2026 that may be closed by this pull request

[support] Training QwenX models without the vision block #3891

Closed

adityavavreNVDA mentioned this pull request Jun 5, 2026

[feature] Add qwen3.5 config + example for LLM only. #2973

Closed

Conversation

HowardZorn commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changelog

Second commit changelog

Key design decisions

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Acknowledgements

Uh oh!

copy-pr-bot Bot commented May 10, 2026

Uh oh!

Uh oh!

Uh oh!

cuichenx commented May 11, 2026

Uh oh!

yaoyu-33 commented May 20, 2026

Uh oh!

Uh oh!

claude Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review — Qwen3.5 Text Model Bridges

Uh oh!

cuichenx commented May 21, 2026

Uh oh!

HowardZorn commented May 22, 2026

Uh oh!

yaoyu-33 commented May 22, 2026

Uh oh!

HowardZorn commented May 22, 2026

Uh oh!

yaoyu-33 commented May 27, 2026

Uh oh!

cuichenx commented May 28, 2026

Uh oh!

HowardZorn commented May 28, 2026

Uh oh!

cuichenx commented May 29, 2026

Uh oh!

cuichenx commented May 29, 2026

Uh oh!

cuichenx commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HowardZorn commented May 10, 2026 •

edited

Loading

claude Bot commented May 20, 2026 •

edited

Loading