Adding Qwen3 and Qwen3MoE#36878
Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the |
|
Will Qwen3 be implemented in Sglang? |
|
@Swipe4057 we are working on |
ArthurZucker
left a comment
There was a problem hiding this comment.
HUGE 🚀 🚀 🚀 🚀 🚀 🚀 🚀
Super small comments:
- for the moe inheriting from Mixtral or QwenMoe to get the forward will be "simpler"
- attention paradigm inheriting from Olmo2!
- just a question on max sliding window to be enforced or not!
Missing:
- qenw3.md
- qwen3_moe.md
That's it!
ArthurZucker
left a comment
There was a problem hiding this comment.
A last nit (either rename keys to explicit the difference or we just use the class that already exists! Happy to merge if you want 🤗
|
Merging from main should help with the ci! |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ValueError: The following configurations don't contain any valid checkpoint:
Qwen3MoeConfig
The requirement is to include a link pointing to one of the models of this architecture in the docstring of the config classes listed above. The link should have be a markdown format like [myorg/mymodel](https://huggingface.co/myorg/mymodel).this can be ignored if you want (the repo consistency check) |
After merging from the main branch, I noticed that some tests are still not passing as expected. Could you help take a look at the reasons? Or are these tests non-critical for our PR? |
|
@bozheng-hit should be good, failing tests are unrelated. I simplified a little bit the modular! Could you review and maybe update the readme, otherwise I can merge as is if it helps your release cycle!! 🤗 |
|
(Just waiting for your input to merge!) |
Hi, I will revert your changes to Qwen3MoE since the model cannot be loaded correctly after incorporating your modifications. |
|
Mmm Okay let me do another pass to fix the tests / make sure my changes don't prevent loading! |
|
BTW looping on the expert is not super optimal, at term we'll see what we can do to standardize this and support fast moe kernels |
|
@bozheng-hit merged! Once you have an article or something we can also update the |
Thanks! We'll update the |
* Initial commit for Qwen3 * fix and add tests for qwen3 & qwen3_moe * rename models for tests. * fix * fix * fix and add docs. * fix model name in docs. * simplify modular and fix configuration issues * Fix the red CI: ruff was updated * revert ruff, version was wrong * fix qwen3moe. * fix * make sure MOE can load * fix copies --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
* Initial commit for Qwen3 * fix and add tests for qwen3 & qwen3_moe * rename models for tests. * fix * fix * fix and add docs. * fix model name in docs. * simplify modular and fix configuration issues * Fix the red CI: ruff was updated * revert ruff, version was wrong * fix qwen3moe. * fix * make sure MOE can load * fix copies --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
|
@bozheng-hit I want to say thank you for adding export support for the new Qwen3 model, making it ExecuTorch compatible in Day 1! |
* Enable Qwen3 and Qwen3-MOE for openvino huggingface/transformers#36878 * Update optimum/exporters/openvino/model_configs.py Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> * Update optimum/exporters/openvino/model_configs.py Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> * Update optimum/exporters/openvino/model_configs.py Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> * add qwen3 test case * add simplified chat template for qwen3 * Update optimum/exporters/openvino/model_configs.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * update chat template * fix style * Update tests/openvino/test_modeling.py * update spda number --------- Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> Co-authored-by: Ella Charlaix <ella@huggingface.co> Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
* Initial commit for Qwen3 * fix and add tests for qwen3 & qwen3_moe * rename models for tests. * fix * fix * fix and add docs. * fix model name in docs. * simplify modular and fix configuration issues * Fix the red CI: ruff was updated * revert ruff, version was wrong * fix qwen3moe. * fix * make sure MOE can load * fix copies --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
|
Hi @bozheng-hit I am not able to find
which is used in this PR code. Is it |
* Initial commit for Qwen3 * fix and add tests for qwen3 & qwen3_moe * rename models for tests. * fix * fix * fix and add docs. * fix model name in docs. * simplify modular and fix configuration issues * Fix the red CI: ruff was updated * revert ruff, version was wrong * fix qwen3moe. * fix * make sure MOE can load * fix copies --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Adding Qwen3
This PR adds the support of codes for the coming Qwen3 models. For information about Qwen, please visit https://github.com/QwenLM/Qwen2.5. @ArthurZucker