Skip to content

Add Qwen3-VL training chat template with generation markers#5764

Merged
qgallouedec merged 1 commit into
huggingface:mainfrom
aazizyan:qwen3-vl-generation-markers
May 20, 2026
Merged

Add Qwen3-VL training chat template with generation markers#5764
qgallouedec merged 1 commit into
huggingface:mainfrom
aazizyan:qwen3-vl-generation-markers

Conversation

@aazizyan

@aazizyan aazizyan commented May 13, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Adds a {% generation %}-marked training variant of the Qwen3-VL chat template so that return_assistant_tokens_mask=True produces correct masks for SFT assistant-only loss.

Diff vs qwen3_vl.jinja: wrap the assistant turn body — text content + tool_calls + closing <|im_end|>\n — with {% generation %} / {% endgeneration %}. The <|im_start|>assistant\n prompt cue stays outside the block; the tool and user branches are untouched, since tool responses are model input, not output.

Note: this code path is currently unreachable

SFTTrainer raises at trl/trainer/sft_trainer.py:1006 for any VLM with assistant_only_loss=True:

if self._is_vlm and args.assistant_only_loss:
    raise ValueError(
        "Assistant-only loss is not yet supported for vision-language models. ..."
    )

So the new template cannot be exercised by a real SFT training run today. This PR is forward-looking prep, submitted under the explicit sanction in #5471: "VLMs currently don't support assistant_only_loss in SFT (blocked by a separate check). These should still be tracked so templates are ready when support lands." When the guard lifts in a separate effort, Qwen3-VL training will work without an additional change here.

Validation surface, in the meantime, is the existing TestGetTrainingChatTemplate suite — prefix-preservation, behavior-unchanged, and mask-correctness checks against the rendered template string. No real SFT training was attempted.

Changes:

  • trl/chat_templates/qwen3_vl_training.jinja: new training template, identical to qwen3_vl.jinja except for the {% generation %} / {% endgeneration %} markers wrapping the assistant body.
  • trl/chat_template_utils.py: load the new template and add a dispatch branch in get_training_chat_template() for qwen3_vl_chat_template. Docstring updated to mention Qwen3-VL.
  • tests/test_chat_template_utils.py: extend the TestGetTrainingChatTemplate parametrize with the trl-internal-testing/tiny-Qwen3VLForConditionalGeneration fixture — runs prefix-preservation, behavior-unchanged, and mask-correctness checks against the new template.
  • trl/chat_templates/README.md and docs/source/chat_templates.md: short section describing the training template, mirroring the structure of the existing qwen3_training.jinja entries.

Part of #5471.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

  • No AI usage: the PR was written entirely by a human.
  • AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
  • AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

@qgallouedec


Note

Low Risk
Low risk: adds a new opt-in training chat template and selection branch for Qwen3-VL plus associated tests/docs, with minimal impact outside that model family.

Overview
Adds a new qwen3_vl_training.jinja template that wraps Qwen3-VL assistant output (text, tool_calls, and the closing <|im_end|>) in {% generation %} / {% endgeneration %} to enable correct assistant-token masking for SFT.

Updates get_training_chat_template() to recognize the Qwen3-VL base template and return this new training variant, extends the existing template utility test matrix to cover a Qwen3-VL processor, and documents the new training template in both the library and Sphinx docs.

Reviewed by Cursor Bugbot for commit 5b66bc2. Bugbot is set up for automated code reviews on this repo. Configure here.

@qgallouedec qgallouedec left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@aazizyan

Copy link
Copy Markdown
Contributor Author

Hi @qgallouedec! The failing Tests (3.10–3.14) / dev-deps / min-versions jobs all fail on the same assertion:

AssertionError: Param model.visual.blocks.0.norm1.weight is not updated

across SFT / DPO / GRPO / RLOO test_train_vlm* against tiny-Qwen2_5_VL and tiny-Qwen3VL — code paths this PR doesn't touch.

What's your call?

@aazizyan aazizyan force-pushed the qwen3-vl-generation-markers branch from 5e54120 to 5b66bc2 Compare May 17, 2026 16:41
@aazizyan

Copy link
Copy Markdown
Contributor Author

Update: rebased on main; the torch<2.12.0 pin from #5769 was missing from my branch. CI should be green now.

@qgallouedec qgallouedec merged commit f6e5c11 into huggingface:main May 20, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants