Add Qwen3-VL training chat template with generation markers by aazizyan · Pull Request #5764 · huggingface/trl

aazizyan · 2026-05-13T13:56:25Z

What does this PR do?

Adds a {% generation %}-marked training variant of the Qwen3-VL chat template so that return_assistant_tokens_mask=True produces correct masks for SFT assistant-only loss.

Diff vs qwen3_vl.jinja: wrap the assistant turn body — text content + tool_calls + closing <|im_end|>\n — with {% generation %} / {% endgeneration %}. The <|im_start|>assistant\n prompt cue stays outside the block; the tool and user branches are untouched, since tool responses are model input, not output.

Note: this code path is currently unreachable

SFTTrainer raises at trl/trainer/sft_trainer.py:1006 for any VLM with assistant_only_loss=True:

if self._is_vlm and args.assistant_only_loss:
    raise ValueError(
        "Assistant-only loss is not yet supported for vision-language models. ..."
    )

So the new template cannot be exercised by a real SFT training run today. This PR is forward-looking prep, submitted under the explicit sanction in #5471: "VLMs currently don't support assistant_only_loss in SFT (blocked by a separate check). These should still be tracked so templates are ready when support lands." When the guard lifts in a separate effort, Qwen3-VL training will work without an additional change here.

Validation surface, in the meantime, is the existing TestGetTrainingChatTemplate suite — prefix-preservation, behavior-unchanged, and mask-correctness checks against the rendered template string. No real SFT training was attempted.

Changes:

trl/chat_templates/qwen3_vl_training.jinja: new training template, identical to qwen3_vl.jinja except for the {% generation %} / {% endgeneration %} markers wrapping the assistant body.
trl/chat_template_utils.py: load the new template and add a dispatch branch in get_training_chat_template() for qwen3_vl_chat_template. Docstring updated to mention Qwen3-VL.
tests/test_chat_template_utils.py: extend the TestGetTrainingChatTemplate parametrize with the trl-internal-testing/tiny-Qwen3VLForConditionalGeneration fixture — runs prefix-preservation, behavior-unchanged, and mask-correctness checks against the new template.
trl/chat_templates/README.md and docs/source/chat_templates.md: short section describing the training template, mirroring the structure of the existing qwen3_training.jinja entries.

Part of #5471.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

No AI usage: the PR was written entirely by a human.
AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

@qgallouedec

Note

Low Risk
Low risk: adds a new opt-in training chat template and selection branch for Qwen3-VL plus associated tests/docs, with minimal impact outside that model family.

Overview
Adds a new qwen3_vl_training.jinja template that wraps Qwen3-VL assistant output (text, tool_calls, and the closing <|im_end|>) in {% generation %} / {% endgeneration %} to enable correct assistant-token masking for SFT.

Updates get_training_chat_template() to recognize the Qwen3-VL base template and return this new training variant, extends the existing template utility test matrix to cover a Qwen3-VL processor, and documents the new training template in both the library and Sphinx docs.

^{Reviewed by Cursor Bugbot for commit 5b66bc2. Bugbot is set up for automated code reviews on this repo. Configure here.}

qgallouedec

@codex review

chatgpt-codex-connector · 2026-05-17T14:54:57Z

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

HuggingFaceDocBuilderDev · 2026-05-17T14:55:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

aazizyan · 2026-05-17T16:17:16Z

Hi @qgallouedec! The failing Tests (3.10–3.14) / dev-deps / min-versions jobs all fail on the same assertion:

AssertionError: Param model.visual.blocks.0.norm1.weight is not updated

across SFT / DPO / GRPO / RLOO test_train_vlm* against tiny-Qwen2_5_VL and tiny-Qwen3VL — code paths this PR doesn't touch.

What's your call?

aazizyan · 2026-05-17T16:44:10Z

Update: rebased on main; the torch<2.12.0 pin from #5769 was missing from my branch. CI should be green now.

qgallouedec approved these changes May 17, 2026

View reviewed changes

Add Qwen3-VL training chat template with generation markers

5b66bc2

aazizyan force-pushed the qwen3-vl-generation-markers branch from 5e54120 to 5b66bc2 Compare May 17, 2026 16:41

qgallouedec merged commit f6e5c11 into huggingface:main May 20, 2026
12 checks passed

qgallouedec mentioned this pull request May 25, 2026

Tracking: Add {% generation %} chat templates for common model families #5471

Open

24 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3-VL training chat template with generation markers#5764

Add Qwen3-VL training chat template with generation markers#5764
qgallouedec merged 1 commit into
huggingface:mainfrom
aazizyan:qwen3-vl-generation-markers

aazizyan commented May 13, 2026 •

edited by cursor Bot

Loading

Uh oh!

qgallouedec left a comment

Uh oh!

chatgpt-codex-connector Bot commented May 17, 2026

Uh oh!

HuggingFaceDocBuilderDev commented May 17, 2026

Uh oh!

aazizyan commented May 17, 2026

Uh oh!

aazizyan commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aazizyan commented May 13, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Note: this code path is currently unreachable

Changes:

Before submitting

AI writing disclosure

Who can review?

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot commented May 17, 2026

Uh oh!

HuggingFaceDocBuilderDev commented May 17, 2026

Uh oh!

aazizyan commented May 17, 2026

Uh oh!

aazizyan commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aazizyan commented May 13, 2026 •

edited by cursor Bot

Loading