Prevent corruption of DPO VLM training if "keep_end" truncation_mode by albertvillanova · Pull Request #5286 · huggingface/trl

albertvillanova · 2026-03-13T09:00:45Z

Prvent corruption of DPO VLM training if "keep_end" truncation_mode:

Raise ValueError when truncation_mode="keep_end" is used for VLM training in DPO.

This PR addresses a regression related to vision-language models (VLMs) and sequence truncation. It ensures that using the keep_end truncation mode with VLMs raises a clear error at initialization, preventing silent corruption of training data. The update includes both a code fix and a regression test.

Changes

Validation improvements for vision-language models:

Added a check in the DPOTrainer.__init__ method to raise a ValueError if a vision-language dataset is used with truncation_mode='keep_end', explaining that image tokens would be dropped and recommending alternatives.

Testing enhancements:

Introduced a regression test (test_train_vlm_keep_end_raises) to verify that initializing training with truncation_mode='keep_end' for a vision-language model raises the expected error, preventing silent data corruption.

Note

Low Risk
Low risk: adds a defensive init-time validation and a regression test; only affects the VLM + max_length + truncation_mode='keep_end' configuration by failing fast instead of proceeding.

Overview
Prevents silent corruption in DPO vision-language training by failing fast when a vision dataset is used with max_length set and truncation_mode='keep_end', raising a clear ValueError during DPOTrainer initialization.

Adds a regression test to ensure VLM trainer construction with keep_end truncation reliably errors (fix for #5285).

^{Written by Cursor Bugbot for commit f36b3c3. This will update automatically on new commits. Configure here.}

HuggingFaceDocBuilderDev · 2026-03-13T09:03:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2026-03-13T15:11:12Z

I'm not sure to understand this one. The image tokens are indeed in the beginning, but they are usually not the first tokens. Eg if the data looks like

<user><img>What is it?<assistant>A flower<end>
<user><img>What is it?<assistant>A bus<end>

What prevents from truncating the first token (<user>)?

albertvillanova · 2026-03-16T07:33:24Z

@qgallouedec, even in your very edge case, how could you be sure that you are just removing the first token (and not any image token) with "keep_end" in every example when they have different lengths? max_length is a single scalar applied uniformly to the whole dataset.

What this PR is trying to solve is slightly narrower: with keep_end, truncation removes the varying-length prefix of the sequence, so in a vision example it can drop the whole prompt prefix including the image placeholder/tokens. In your example, that could indeed mean dropping first, then , then "What is it?", depending on how much needs to be truncated.

The reason I called out image tokens explicitly is that, for VLM inputs, losing them is especially problematic: once the visual tokens are truncated away, the example is no longer a valid multimodal sample and can become semantically inconsistent with the remaining text. By contrast, truncating text-only prefixes is still undesirable, but it is the usual trade-off of sequence truncation and not something specific to vision inputs.

I could improve the wording of the error to make it more precise: the core argument is not that image tokens are necessarily the very first tokens, but that they live in the prefix region that keep_end is designed to discard.

- "truncation_mode='keep_end' is not supported for vision-language models. Image tokens reside in "
- "the prompt at the beginning of the sequence; keeping the end would drop them. Use "
- "truncation_mode='keep_start' (the default) or set max_length=None."
+ "truncation_mode='keep_end' is not supported for vision-language models. Image tokens reside "
+ "inside the prompt portion of the sequence; depending on the example, keep_end may silently "
+ "drop them, causing pixel_values to be forwarded to the model with no corresponding visual "
+ "tokens in input_ids. Use truncation_mode='keep_start' (the default) or set max_length=None."

keep_start does not have this problem: as long as max_length >= prompt_len, image tokens are always safe.

qgallouedec · 2026-03-16T16:30:16Z

I see. My assumption was that if you modify truncation, you're expected to understand the implications. In other words, if your truncation settings end up affecting images, that’s your responsibility.

In practice, we strongly recommend disabling truncation for VLMs. If you still choose to enable it, it’s not surprising if things break.

In any case, the truncation_side=“keep_end” option is generally undesirable; therefore, disallowing it for VLMs—where it is even more undesirable—is also a reasonable solution.

Should we align SFT as well?

albertvillanova · 2026-03-17T08:05:03Z

Thanks for the clarification, @qgallouedec.

My motivation for raising a ValueError here is less about shielding users from all misconfigurations, and more about catching a systematically unsafe configuration. In this case, keep_end is not just "suboptimal", but it almost deterministically breaks the multimodal structure by removing the prefix where image tokens live. So it felt closer to an invalid setting than a risky one.

Regarding your last point, yes, I think it would be good to align SFT as well for consistency. If this approach is approved, I will implement the same policy across trainers.

qgallouedec

👌

commit 52cd0cc Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:31:26 2026 +0100 Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model (#5295) commit 7b42fc4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:29:11 2026 +0100 Prevent corruption of DPO VLM training if "keep_end" truncation_mode (#5286) commit 3acb8e8 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:27:10 2026 +0100 Support max_length in DPO VLM training (#5284) commit ee339a0 Author: Carlos Miguel Patiño <carlos.patino@huggingface.co> Date: Tue Mar 17 14:01:44 2026 +0100 [GKD] Buffer Implementation for Distillation Trainer (#5137) Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> commit d46131f Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:27:19 2026 +0100 Remove custom get_train/eval_dataloader from OnlineDPO (#5291) commit 85cf8f4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:24:24 2026 +0100 Remove TrainingArguments import from experimental trainers (#5290) commit 91e3da0 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Mar 16 07:19:51 2026 -0600 Fix `accuracy_reward` crash when called from non-main thread (#5281) commit 4996631 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:44:28 2026 +0100 Fix support for model_init_kwargs in MiniLLM when passed as CLI JSON string (#5274) commit 5fceaa7 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:43:34 2026 +0100 Simplify structured outputs logic across vLLM versions in scripts/vllm_serve (#5273) commit 406d406 Author: casinca <47400729+casinca@users.noreply.github.com> Date: Sat Mar 14 04:12:49 2026 +0100 feat(`grpo_trainer.py`): Variational Sequence-Level Soft Policy Optimization (VESPO) (#5199) commit d0ac7ef Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com> Date: Sat Mar 14 02:53:33 2026 +0100 Allow nullable logprobs in vLLM serve responses (#5203) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit c0eabc4 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Fri Mar 13 18:19:15 2026 -0600 Change default `vllm_mode` to `"colocate"` and add v0→v1 migration guide (#5255) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> commit 6c0fccd Author: Mario Šaško <mariosasko777@gmail.com> Date: Sat Mar 14 00:19:38 2026 +0100 35% faster packing + rename `bfd-requeue` to `bfd_split` (#5189) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 3972d66 Author: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Date: Wed Mar 18 22:26:44 2026 +0100 Suggest the `Json()` type for tool calling dataset format (#5307) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 5c6e915 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 14:55:19 2026 -0600 Update `RewardFunc` type annotation to allow `None`values in reward list (#5297) commit ee96845 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Wed Mar 18 17:03:54 2026 +0100 Fix DPOTrainer collators to truncate sequences before padding (#5305) commit 435c2ae Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 08:09:42 2026 -0600 Add guidance to avoid `hasattr` and `getattr` with defaults in `AGENTS.md` (#5294) commit 26ce6a3 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 00:44:12 2026 -0600 Apply docstyle (#5296) commit 52cd0cc Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:31:26 2026 +0100 Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model (#5295) commit 7b42fc4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:29:11 2026 +0100 Prevent corruption of DPO VLM training if "keep_end" truncation_mode (#5286) commit 3acb8e8 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:27:10 2026 +0100 Support max_length in DPO VLM training (#5284) commit ee339a0 Author: Carlos Miguel Patiño <carlos.patino@huggingface.co> Date: Tue Mar 17 14:01:44 2026 +0100 [GKD] Buffer Implementation for Distillation Trainer (#5137) Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> commit d46131f Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:27:19 2026 +0100 Remove custom get_train/eval_dataloader from OnlineDPO (#5291) commit 85cf8f4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:24:24 2026 +0100 Remove TrainingArguments import from experimental trainers (#5290) commit 91e3da0 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Mar 16 07:19:51 2026 -0600 Fix `accuracy_reward` crash when called from non-main thread (#5281) commit 4996631 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:44:28 2026 +0100 Fix support for model_init_kwargs in MiniLLM when passed as CLI JSON string (#5274) commit 5fceaa7 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:43:34 2026 +0100 Simplify structured outputs logic across vLLM versions in scripts/vllm_serve (#5273) commit 406d406 Author: casinca <47400729+casinca@users.noreply.github.com> Date: Sat Mar 14 04:12:49 2026 +0100 feat(`grpo_trainer.py`): Variational Sequence-Level Soft Policy Optimization (VESPO) (#5199) commit d0ac7ef Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com> Date: Sat Mar 14 02:53:33 2026 +0100 Allow nullable logprobs in vLLM serve responses (#5203) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit c0eabc4 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Fri Mar 13 18:19:15 2026 -0600 Change default `vllm_mode` to `"colocate"` and add v0→v1 migration guide (#5255) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> commit 6c0fccd Author: Mario Šaško <mariosasko777@gmail.com> Date: Sat Mar 14 00:19:38 2026 +0100 35% faster packing + rename `bfd-requeue` to `bfd_split` (#5189) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

…uggingface#5286)

albertvillanova added 2 commits March 13, 2026 09:55

Add regression test

d636963

Raise ValueError

60dcb8f

albertvillanova changed the title ~~Prvent corruption of DPO VLM training if "keep_end" truncation_mode~~ Prevent corruption of DPO VLM training if "keep_end" truncation_mode Mar 13, 2026

albertvillanova mentioned this pull request Mar 16, 2026

Support max_length in DPO VLM training #5284

Merged

albertvillanova added 4 commits March 16, 2026 08:39

Rephrase error message

1530157

Merge remote-tracking branch 'upstream/main' into fix-5285

221421e

Merge remote-tracking branch 'upstream/main' into fix-5285

ceccbe3

Merge remote-tracking branch 'upstream/main' into fix-5285

f36b3c3

qgallouedec approved these changes Mar 17, 2026

View reviewed changes

albertvillanova merged commit 7b42fc4 into huggingface:main Mar 17, 2026
11 of 12 checks passed

albertvillanova mentioned this pull request Mar 18, 2026

Support truncation_mode in SFT #5306

Merged

songhappy pushed a commit to songhappy/trl that referenced this pull request Apr 20, 2026

Prevent corruption of DPO VLM training if "keep_end" truncation_mode (h…

d8a4e1b

…uggingface#5286)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent corruption of DPO VLM training if "keep_end" truncation_mode#5286

Prevent corruption of DPO VLM training if "keep_end" truncation_mode#5286
albertvillanova merged 6 commits into
huggingface:mainfrom
albertvillanova:fix-5285

albertvillanova commented Mar 13, 2026 •

edited by cursor Bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 13, 2026

Uh oh!

qgallouedec commented Mar 13, 2026

Uh oh!

albertvillanova commented Mar 16, 2026 •

edited

Loading

Uh oh!

qgallouedec commented Mar 16, 2026 •

edited

Loading

Uh oh!

albertvillanova commented Mar 17, 2026

Uh oh!

qgallouedec left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

albertvillanova commented Mar 13, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

HuggingFaceDocBuilderDev commented Mar 13, 2026

Uh oh!

qgallouedec commented Mar 13, 2026

Uh oh!

albertvillanova commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albertvillanova commented Mar 17, 2026

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albertvillanova commented Mar 13, 2026 •

edited by cursor Bot

Loading

albertvillanova commented Mar 16, 2026 •

edited

Loading

qgallouedec commented Mar 16, 2026 •

edited

Loading