[model] feat: add Qwen3.5 text model bridges (dense + MoE)#3769
Conversation
|
can we also add a functional test in addition to the unit test? |
|
/claude review |
Code Review — Qwen3.5 Text Model BridgesOverall this is a clean contribution. The new bridges follow established patterns from Qwen3-Next, the MTP path rename (transformer_layer to mtp_model_layer) is applied consistently across all three bridge files, and the test coverage is solid. Issues
Suggested test cases No perf tests impacted. |
|
@HowardZorn Can you let us know if you're still working on this PR or would like us to take over? Thanks |
Sorry for the wait. We spent last week conducting testing and a major refactor. We're now submitting a significantly improved version of the Qwen3.5-LM bridge for review. |
|
@HowardZorn thanks for the effort. please resolve conflicts if you got time. @cuichenx can you review again |
f0068f3 to
2560bf8
Compare
Thanks for the reminder, I just rebased onto the main branch. And I am doing tests on my local machine. |
|
/ok to test 2560bf8 |
… MTP mcore path Add Qwen3_5Bridge and Qwen3_5MoEBridge for HF ↔ Megatron-Core weight conversion of Qwen3.5 language models with hybrid GDN+Attention architecture. Also align MTP Megatron-Core parameter paths from transformer_layer to mtp_model_layer in Qwen3 and Qwen3-Next bridges. Signed-off-by: He Ruozhou <heruozhou@xiaohongshu.com>
…unctional tests - Rename Qwen3_5Bridge → Qwen35Bridge and Qwen3_5MoEBridge → Qwen35MoEBridge to follow project naming conventions. - Merge dense and MoE text bridges into a single qwen35_bridge.py module with extracted helper functions (_apply_qwen35_common_config, _apply_qwen35_moe_config) and static mapping methods (_get_dense_lm_mappings, _get_moe_lm_mappings, etc.) that are consumed by both text and VL bridges, eliminating significant code duplication. - Fix incorrect HF param paths in dense LM mappings (removed spurious language_model nesting). - Remove `transpose_on_export` from Qwen3.5 bridge to fix shape mismatch. - Add functional tests covering single-GPU roundtrip, multi-GPU parallelism, and autoconfig conversion for both dense and MoE variants. - Format with ruff linter. Co-authored-by: HuayiJin <27394199+HuayiJin@users.noreply.github.com> Co-authored-by: whitelok <whitelok@163.com> Signed-off-by: He Ruozhou <heruozhou@xiaohongshu.com>
|
/ok to test 9beec42 |
|
It seems bc8ace9 fixed this outdated URL issues in docs. Should I rebase my branch onto the main branch? |
|
/ok to test c74a2d7 |
|
/ok to test 2f9c90a |
|
/ok to test c67e020 |
…Mo#3769) Signed-off-by: He Ruozhou <heruozhou@xiaohongshu.com> Co-authored-by: HuayiJin <27394199+HuayiJin@users.noreply.github.com> Co-authored-by: whitelok <whitelok@163.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>
What does this PR do?
Add Megatron Bridge support for Qwen3.5 language models (Qwen3.5-LM), which are the text-only component extracted from Qwen3.5-VL. The model architecture is similar to Qwen3-Next (hybrid GDN + standard attention)
, but the HF checkpoint organization follows the Qwen3.5-VL convention (. Additionally, the MTP mcore path is renamed frommodel.language_model.*prefix instead ofmodel.layers.*)transformer_layertomtp_model_layerto align with Megatron-Core's actual module naming.Changelog
Qwen3_5Bridgefor dense Qwen3.5 language models (qwen3_5_bridge.py)Qwen3_5MoEBridgefor MoE variant of Qwen3.5 language models (qwen3_5_moe_bridge.py)models/qwen/__init__.pywith model typesqwen3_5_textandqwen3_5_moe_textqwen3_bridge.py:mtp.layers.*.transformer_layer.*→mtp.layers.*.mtp_model_layer.*qwen3_next_bridge.py: same rename as above (13 mappings updated)test_qwen35_bridge.py)Second commit changelog
Qwen3_5Bridge →Qwen35BridgeandQwen3_5MoEBridge→Qwen35MoEBridgeto follow project naming conventions.
qwen35_bridge.pymodule withextracted helper functions and static mapping methods that are consumed by both text and VL bridges, eliminating code duplication.
transpose_on_exportfrom Qwen3.5 bridge to fix shape mismatch.Key design decisions
model.language_model.*model.layers.*gate_up_proj)gate_proj/up_proj)GDNLinearMappingSeparate(4-tensor: qkv/z/b/a)GDNLinearMapping(2-tensor: qkvz/ba)GitHub Actions CI
See the CI section in the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
Additional Information
but with VL-style checkpoint organization (This is because Hugging Face transformers do not accept other state dict organizations).Acknowledgements
From Engine Architecture Group 5, Engine Infrastructure Department, Xiaohongshu (RedNote).