Add Qwen3.5 Think/NoThink training chat templates with generation markers#5824
Merged
qgallouedec merged 1 commit intoMay 25, 2026
Merged
Conversation
Member
|
Nice, thanks, let's see if the CI is green |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Member
|
@codex review |
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
qgallouedec
approved these changes
May 23, 2026
8977b30 to
c50cda2
Compare
24 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adds training chat templates for Qwen3.5 (Think and NoThink), wiring both into
get_training_chat_template.The two new templates mirror the three modifications already shipped in
qwen3_6_training.jinja: require both<think>and</think>before parsing (instead of only</think>), drop theloop.index0 > ns.last_query_indexconditional so the thinking block is always emitted (prefix-preservation), and wrap assistant output with{% generation %}/{% endgeneration %}markers.ThinkandNoThinkvariants differ only in the default value of theenable_thinkingflag inherited from their respective base templates.Note: source templates renamed
The source templates are renamed from size-based naming to behavior-based naming, matching the
-Think/-NoThinkfixture suffixes introduced in #5819 and making the size-default mapping explicit in the README. No behavior change.qwen3_5_4b_and_above.jinja->qwen3_5_think.jinjaqwen3_5_2b_and_below.jinja->qwen3_5_nothink.jinjaChanges
trl/chat_templates/— two new training templates, each a 3-line diff against its renamed source:qwen3_5_think_training.jinjaqwen3_5_nothink_training.jinjatrl/chat_template_utils.py:qwen3_5_chat_template_{2b_and_below,4b_and_above}->qwen3_5_{nothink,think}_chat_template; same renames carried through theadd_response_schemaelif (response schema is unchanged — both variants still map toqwen3_5_schema).get_training_chat_template()mapping each source template to its training variant.tests/test_chat_template_utils.py— two new entries inTestGetTrainingChatTemplate's parametrize block (qwen35-nothink+qwen35-think), each carrying the sametransformers>=5.0.0skipifmark.Docs — Qwen3.5 added to the supported-families list, combined training-template section added in:
docs/source/chat_templates.mdtrl/chat_templates/README.mdPart of #5471
cc: @qgallouedec
Note
Low Risk
Changes are limited to chat-template assets, template selection in
chat_template_utils, and tests/docs; no training-loop or model-weight logic.Overview
Adds Qwen3.5 Think/NoThink training chat templates and wires them into automatic template swapping for SFT (
assistant_only_loss) and GRPO (tools).Reference templates are renamed from size-based to behavior-based names (
qwen3_5_think.jinja/qwen3_5_nothink.jinja);chat_template_utilsloads and matches those names foradd_response_schemaandget_training_chat_template(unchangedqwen3_5_schema).New training patches (
qwen3_5_think_training.jinja,qwen3_5_nothink_training.jinja) mirror Qwen3.6: require bothredacted_thinkingopen/close tags before splitting content, always emit the thinking block (prefix-preserving when a tool message follows), and wrap assistant turns ingenerationmarkers. Think vs NoThink only differs in the defaultenable_thinkingon the generation prompt.Tests/docs:
TestGetTrainingChatTemplategains tiny Qwen3.5 Think/NoThink fixtures (transformers ≥ 5.0); supported-family lists updated indocs/source/chat_templates.mdandtrl/chat_templates/README.md.Reviewed by Cursor Bugbot for commit c50cda2. Bugbot is set up for automated code reviews on this repo. Configure here.