Qwen3.6 integration#5642
Merged
Merged
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
AmineDiro
approved these changes
Apr 25, 2026
AmineDiro
left a comment
Member
There was a problem hiding this comment.
super cool to suport it this quick !
qgallouedec
added a commit
that referenced
this pull request
Apr 27, 2026
24 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Qwen3.6 (
Qwen/Qwen3.6-27B,Qwen/Qwen3.6-35B-A3B) reuses theQwen3_5Moe*architecture but ships a slightly different chat template (addspreserve_thinkingflag, tweaks tool-arg stringification). Exact-string template matching inchat_template_utils.pytherefore fails for Qwen3.6 tokenizers.Changes
qwen3_6.jinja(verbatim from upstream) andqwen3_6_training.jinja(prefix-preserving +{% generation %}markers).chat_template_utils.py: register both templates; route Qwen3.6 to the existingqwen3_5_schemainadd_response_schema(output format is unchanged); route toqwen3_6_training_chat_templateinget_training_chat_template.scripts/generate_tiny_models.py: addQwen/Qwen3.6-35B-A3Bto the VLM loop (pushed astiny-Qwen3_5MoeForConditionalGeneration-3.6to leave room for future Qwen3.5-MoE variants); reuse the Qwen3.5 dense gotchas (force one full-attention layer, fp32-restore for linear-attn weights) and add MoE-specific shrinks.test_chat_template_utils,test_data_utils.TestApplyChatTemplate, and the SFT/DPO/GRPO/RLOOtest_(train|training)_vlmcases.Note
Medium Risk
Medium risk because it extends chat-template matching and training template swapping for a new model family, which can affect tool-calling formatting and assistant-only loss masking across Qwen variants if the template detection/mapping is wrong.
Overview
Adds Qwen3.6 support by bundling upstream
qwen3_6.jinjaplus a new training-patchedqwen3_6_training.jinja(prefix-preserving tool-call rendering and{% generation %}markers for assistant-only loss).Updates
chat_template_utils.pyto recognize Qwen3.6 templates foradd_response_schema(reusing the existing Qwen3.5 response schema) and to return the new Qwen3.6 training template fromget_training_chat_template.Extends the tiny-model generator and test matrix to include a Qwen3.6 VLM tiny model (
tiny-Qwen3_5MoeForConditionalGeneration-3.6), with MoE-specific config downsizing and docs updated to list Qwen3.6 as supported/tested.Reviewed by Cursor Bugbot for commit 9a66674. Bugbot is set up for automated code reviews on this repo. Configure here.