Add Qwen3-VL training chat template with generation markers#5764
Conversation
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Hi @qgallouedec! The failing across SFT / DPO / GRPO / RLOO What's your call? |
5e54120 to
5b66bc2
Compare
|
Update: rebased on |
What does this PR do?
Adds a
{% generation %}-marked training variant of the Qwen3-VL chat template so thatreturn_assistant_tokens_mask=Trueproduces correct masks for SFT assistant-only loss.Diff vs
qwen3_vl.jinja: wrap the assistant turn body — text content +tool_calls+ closing<|im_end|>\n— with{% generation %}/{% endgeneration %}. The<|im_start|>assistant\nprompt cue stays outside the block; thetoolanduserbranches are untouched, since tool responses are model input, not output.Note: this code path is currently unreachable
SFTTrainerraises attrl/trainer/sft_trainer.py:1006for any VLM withassistant_only_loss=True:So the new template cannot be exercised by a real SFT training run today. This PR is forward-looking prep, submitted under the explicit sanction in #5471: "VLMs currently don't support
assistant_only_lossin SFT (blocked by a separate check). These should still be tracked so templates are ready when support lands." When the guard lifts in a separate effort, Qwen3-VL training will work without an additional change here.Validation surface, in the meantime, is the existing
TestGetTrainingChatTemplatesuite — prefix-preservation, behavior-unchanged, and mask-correctness checks against the rendered template string. No real SFT training was attempted.Changes:
trl/chat_templates/qwen3_vl_training.jinja: new training template, identical toqwen3_vl.jinjaexcept for the{% generation %}/{% endgeneration %}markers wrapping the assistant body.trl/chat_template_utils.py: load the new template and add a dispatch branch inget_training_chat_template()forqwen3_vl_chat_template. Docstring updated to mention Qwen3-VL.tests/test_chat_template_utils.py: extend theTestGetTrainingChatTemplateparametrize with thetrl-internal-testing/tiny-Qwen3VLForConditionalGenerationfixture — runs prefix-preservation, behavior-unchanged, and mask-correctness checks against the new template.trl/chat_templates/README.mdanddocs/source/chat_templates.md: short section describing the training template, mirroring the structure of the existingqwen3_training.jinjaentries.Part of #5471.
Before submitting
AI writing disclosure
We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.
Who can review?
@qgallouedec
Note
Low Risk
Low risk: adds a new opt-in training chat template and selection branch for Qwen3-VL plus associated tests/docs, with minimal impact outside that model family.
Overview
Adds a new
qwen3_vl_training.jinjatemplate that wraps Qwen3-VL assistant output (text,tool_calls, and the closing<|im_end|>) in{% generation %}/{% endgeneration %}to enable correct assistant-token masking for SFT.Updates
get_training_chat_template()to recognize the Qwen3-VL base template and return this new training variant, extends the existing template utility test matrix to cover a Qwen3-VL processor, and documents the new training template in both the library and Sphinx docs.Reviewed by Cursor Bugbot for commit 5b66bc2. Bugbot is set up for automated code reviews on this repo. Configure here.