Skip to content

Qwen3.6 integration#5642

Merged
qgallouedec merged 3 commits into
mainfrom
qwen3.6
Apr 26, 2026
Merged

Qwen3.6 integration#5642
qgallouedec merged 3 commits into
mainfrom
qwen3.6

Conversation

@qgallouedec

@qgallouedec qgallouedec commented Apr 25, 2026

Copy link
Copy Markdown
Member

Qwen3.6 (Qwen/Qwen3.6-27B, Qwen/Qwen3.6-35B-A3B) reuses the Qwen3_5Moe* architecture but ships a slightly different chat template (adds preserve_thinking flag, tweaks tool-arg stringification). Exact-string template matching in chat_template_utils.py therefore fails for Qwen3.6 tokenizers.

Changes

  • Chat templates: add qwen3_6.jinja (verbatim from upstream) and qwen3_6_training.jinja (prefix-preserving + {% generation %} markers).
  • chat_template_utils.py: register both templates; route Qwen3.6 to the existing qwen3_5_schema in add_response_schema (output format is unchanged); route to qwen3_6_training_chat_template in get_training_chat_template.
  • scripts/generate_tiny_models.py: add Qwen/Qwen3.6-35B-A3B to the VLM loop (pushed as tiny-Qwen3_5MoeForConditionalGeneration-3.6 to leave room for future Qwen3.5-MoE variants); reuse the Qwen3.5 dense gotchas (force one full-attention layer, fp32-restore for linear-attn weights) and add MoE-specific shrinks.
  • Tests: parametrize the new tiny model in test_chat_template_utils, test_data_utils.TestApplyChatTemplate, and the SFT/DPO/GRPO/RLOO test_(train|training)_vlm cases.

Note

Medium Risk
Medium risk because it extends chat-template matching and training template swapping for a new model family, which can affect tool-calling formatting and assistant-only loss masking across Qwen variants if the template detection/mapping is wrong.

Overview
Adds Qwen3.6 support by bundling upstream qwen3_6.jinja plus a new training-patched qwen3_6_training.jinja (prefix-preserving tool-call rendering and {% generation %} markers for assistant-only loss).

Updates chat_template_utils.py to recognize Qwen3.6 templates for add_response_schema (reusing the existing Qwen3.5 response schema) and to return the new Qwen3.6 training template from get_training_chat_template.

Extends the tiny-model generator and test matrix to include a Qwen3.6 VLM tiny model (tiny-Qwen3_5MoeForConditionalGeneration-3.6), with MoE-specific config downsizing and docs updated to list Qwen3.6 as supported/tested.

Reviewed by Cursor Bugbot for commit 9a66674. Bugbot is set up for automated code reviews on this repo. Configure here.

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@AmineDiro AmineDiro left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super cool to suport it this quick !

@qgallouedec qgallouedec merged commit 2f10689 into main Apr 26, 2026
11 of 13 checks passed
@qgallouedec qgallouedec deleted the qwen3.6 branch April 26, 2026 15:16
qgallouedec added a commit that referenced this pull request Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants