Skip to content

Adding Support for Qwen3.5#43830

Merged
vasqu merged 26 commits intohuggingface:mainfrom
bozheng-hit:qwen3_5
Feb 9, 2026
Merged

Adding Support for Qwen3.5#43830
vasqu merged 26 commits intohuggingface:mainfrom
bozheng-hit:qwen3_5

Conversation

@bozheng-hit
Copy link
Contributor

This PR adds the support of codes for the upcoming Qwen3.5 series models. For information about Qwen, please visit:
👉https://qwen.ai

Special thanks to @JJJYmmm for helping complete the code in this PR. We also appreciate the valuable feedback and thorough review provided by @vasqu and @ArthurZucker ! 🙏

@fpshuang

This comment was marked as off-topic.

@1-bytes
Copy link

1-bytes commented Feb 9, 2026

尽快审批~ 期待

@vasqu vasqu enabled auto-merge (squash) February 9, 2026 10:57
@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, qwen3_5, qwen3_5_moe

@vasqu vasqu merged commit fc91372 into huggingface:main Feb 9, 2026
25 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

jiosephlee pushed a commit to jiosephlee/transformers_latest that referenced this pull request Feb 11, 2026
* rebase main

* remove redundant init

* fix

* remove qwen3vlmoe mapping since resolved

* add auto image processor

* fix

* fix

* update qwen3_next_style rmsnorm

* update text config check

* simplify vision model

* simplify vision config

* inherit pretrainedmodel and qwen3next decoder layer forward

* simplify config

* fix apply_rotary_pos_emb import

* move to latest main, update vision output and fix rope validation

* fix text-only model loading

* fix config

* quick fixes

* fix rope ignore keys

* oops

* add test suite

* style

* docs

* ok

* last consistency fix

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: vasqu <antonprogamer@gmail.com>
@MaoJianwei
Copy link

wait for a long time

@BestJuly
Copy link

BestJuly commented Mar 3, 2026

Hi @bozheng-hit , thanks for the PR. I have a simple question, that we noticed that the dtypes for A_log and out_norm in the HF released ckpt are different from Qwen3-Next. Should the storage of these two parameters also be in FP32? Or computation is necessary in FP32 but storage could remain the same as Qwen3-Next, in BF16.
image
image

Current implementation in MCore is using BF16 for storage and computation for these parts are in FP32. But if the storage dtypes are also required in FP32, there require make some changes (draft PR). Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants