Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
|
尽快审批~ 期待 |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, qwen3_5, qwen3_5_moe |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* rebase main * remove redundant init * fix * remove qwen3vlmoe mapping since resolved * add auto image processor * fix * fix * update qwen3_next_style rmsnorm * update text config check * simplify vision model * simplify vision config * inherit pretrainedmodel and qwen3next decoder layer forward * simplify config * fix apply_rotary_pos_emb import * move to latest main, update vision output and fix rope validation * fix text-only model loading * fix config * quick fixes * fix rope ignore keys * oops * add test suite * style * docs * ok * last consistency fix --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com> Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com> Co-authored-by: vasqu <antonprogamer@gmail.com>
|
wait for a long time |
|
Hi @bozheng-hit , thanks for the PR. I have a simple question, that we noticed that the dtypes for Current implementation in MCore is using BF16 for storage and computation for these parts are in FP32. But if the storage dtypes are also required in FP32, there require make some changes (draft PR). Thank you. |


This PR adds the support of codes for the upcoming Qwen3.5 series models. For information about Qwen, please visit:
👉https://qwen.ai
Special thanks to @JJJYmmm for helping complete the code in this PR. We also appreciate the valuable feedback and thorough review provided by @vasqu and @ArthurZucker ! 🙏