Prevent corruption of DPO VLM training if "keep_end" truncation_mode#5286
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
I'm not sure to understand this one. The image tokens are indeed in the beginning, but they are usually not the first tokens. Eg if the data looks like What prevents from truncating the first token ( |
|
@qgallouedec, even in your very edge case, how could you be sure that you are just removing the first token (and not any image token) with "keep_end" in every example when they have different lengths? What this PR is trying to solve is slightly narrower: with The reason I called out image tokens explicitly is that, for VLM inputs, losing them is especially problematic: once the visual tokens are truncated away, the example is no longer a valid multimodal sample and can become semantically inconsistent with the remaining text. By contrast, truncating text-only prefixes is still undesirable, but it is the usual trade-off of sequence truncation and not something specific to vision inputs. I could improve the wording of the error to make it more precise: the core argument is not that image tokens are necessarily the very first tokens, but that they live in the prefix region that - "truncation_mode='keep_end' is not supported for vision-language models. Image tokens reside in "
- "the prompt at the beginning of the sequence; keeping the end would drop them. Use "
- "truncation_mode='keep_start' (the default) or set max_length=None."
+ "truncation_mode='keep_end' is not supported for vision-language models. Image tokens reside "
+ "inside the prompt portion of the sequence; depending on the example, keep_end may silently "
+ "drop them, causing pixel_values to be forwarded to the model with no corresponding visual "
+ "tokens in input_ids. Use truncation_mode='keep_start' (the default) or set max_length=None."
|
|
I see. My assumption was that if you modify truncation, you're expected to understand the implications. In other words, if your truncation settings end up affecting images, that’s your responsibility. In practice, we strongly recommend disabling truncation for VLMs. If you still choose to enable it, it’s not surprising if things break. In any case, the Should we align SFT as well? |
|
Thanks for the clarification, @qgallouedec. My motivation for raising a ValueError here is less about shielding users from all misconfigurations, and more about catching a systematically unsafe configuration. In this case, Regarding your last point, yes, I think it would be good to align SFT as well for consistency. If this approach is approved, I will implement the same policy across trainers. |
commit 52cd0cc Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:31:26 2026 +0100 Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model (#5295) commit 7b42fc4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:29:11 2026 +0100 Prevent corruption of DPO VLM training if "keep_end" truncation_mode (#5286) commit 3acb8e8 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:27:10 2026 +0100 Support max_length in DPO VLM training (#5284) commit ee339a0 Author: Carlos Miguel Patiño <carlos.patino@huggingface.co> Date: Tue Mar 17 14:01:44 2026 +0100 [GKD] Buffer Implementation for Distillation Trainer (#5137) Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> commit d46131f Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:27:19 2026 +0100 Remove custom get_train/eval_dataloader from OnlineDPO (#5291) commit 85cf8f4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:24:24 2026 +0100 Remove TrainingArguments import from experimental trainers (#5290) commit 91e3da0 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Mar 16 07:19:51 2026 -0600 Fix `accuracy_reward` crash when called from non-main thread (#5281) commit 4996631 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:44:28 2026 +0100 Fix support for model_init_kwargs in MiniLLM when passed as CLI JSON string (#5274) commit 5fceaa7 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:43:34 2026 +0100 Simplify structured outputs logic across vLLM versions in scripts/vllm_serve (#5273) commit 406d406 Author: casinca <47400729+casinca@users.noreply.github.com> Date: Sat Mar 14 04:12:49 2026 +0100 feat(`grpo_trainer.py`): Variational Sequence-Level Soft Policy Optimization (VESPO) (#5199) commit d0ac7ef Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com> Date: Sat Mar 14 02:53:33 2026 +0100 Allow nullable logprobs in vLLM serve responses (#5203) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit c0eabc4 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Fri Mar 13 18:19:15 2026 -0600 Change default `vllm_mode` to `"colocate"` and add v0→v1 migration guide (#5255) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> commit 6c0fccd Author: Mario Šaško <mariosasko777@gmail.com> Date: Sat Mar 14 00:19:38 2026 +0100 35% faster packing + rename `bfd-requeue` to `bfd_split` (#5189) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit 3972d66 Author: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Date: Wed Mar 18 22:26:44 2026 +0100 Suggest the `Json()` type for tool calling dataset format (#5307) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 5c6e915 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 14:55:19 2026 -0600 Update `RewardFunc` type annotation to allow `None`values in reward list (#5297) commit ee96845 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Wed Mar 18 17:03:54 2026 +0100 Fix DPOTrainer collators to truncate sequences before padding (#5305) commit 435c2ae Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 08:09:42 2026 -0600 Add guidance to avoid `hasattr` and `getattr` with defaults in `AGENTS.md` (#5294) commit 26ce6a3 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 00:44:12 2026 -0600 Apply docstyle (#5296) commit 52cd0cc Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:31:26 2026 +0100 Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model (#5295) commit 7b42fc4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:29:11 2026 +0100 Prevent corruption of DPO VLM training if "keep_end" truncation_mode (#5286) commit 3acb8e8 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:27:10 2026 +0100 Support max_length in DPO VLM training (#5284) commit ee339a0 Author: Carlos Miguel Patiño <carlos.patino@huggingface.co> Date: Tue Mar 17 14:01:44 2026 +0100 [GKD] Buffer Implementation for Distillation Trainer (#5137) Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> commit d46131f Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:27:19 2026 +0100 Remove custom get_train/eval_dataloader from OnlineDPO (#5291) commit 85cf8f4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:24:24 2026 +0100 Remove TrainingArguments import from experimental trainers (#5290) commit 91e3da0 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Mar 16 07:19:51 2026 -0600 Fix `accuracy_reward` crash when called from non-main thread (#5281) commit 4996631 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:44:28 2026 +0100 Fix support for model_init_kwargs in MiniLLM when passed as CLI JSON string (#5274) commit 5fceaa7 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:43:34 2026 +0100 Simplify structured outputs logic across vLLM versions in scripts/vllm_serve (#5273) commit 406d406 Author: casinca <47400729+casinca@users.noreply.github.com> Date: Sat Mar 14 04:12:49 2026 +0100 feat(`grpo_trainer.py`): Variational Sequence-Level Soft Policy Optimization (VESPO) (#5199) commit d0ac7ef Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com> Date: Sat Mar 14 02:53:33 2026 +0100 Allow nullable logprobs in vLLM serve responses (#5203) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit c0eabc4 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Fri Mar 13 18:19:15 2026 -0600 Change default `vllm_mode` to `"colocate"` and add v0→v1 migration guide (#5255) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> commit 6c0fccd Author: Mario Šaško <mariosasko777@gmail.com> Date: Sat Mar 14 00:19:38 2026 +0100 35% faster packing + rename `bfd-requeue` to `bfd_split` (#5189) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Prvent corruption of DPO VLM training if "keep_end" truncation_mode:
Fix #5285.
This PR addresses a regression related to vision-language models (VLMs) and sequence truncation. It ensures that using the
keep_endtruncation mode with VLMs raises a clear error at initialization, preventing silent corruption of training data. The update includes both a code fix and a regression test.Changes
Validation improvements for vision-language models:
DPOTrainer.__init__method to raise aValueErrorif a vision-language dataset is used withtruncation_mode='keep_end', explaining that image tokens would be dropped and recommending alternatives.Testing enhancements:
test_train_vlm_keep_end_raises) to verify that initializing training withtruncation_mode='keep_end'for a vision-language model raises the expected error, preventing silent data corruption.Note
Low Risk
Low risk: adds a defensive init-time validation and a regression test; only affects the VLM +
max_length+truncation_mode='keep_end'configuration by failing fast instead of proceeding.Overview
Prevents silent corruption in DPO vision-language training by failing fast when a vision dataset is used with
max_lengthset andtruncation_mode='keep_end', raising a clearValueErrorduringDPOTrainerinitialization.Adds a regression test to ensure VLM trainer construction with
keep_endtruncation reliably errors (fix for #5285).Written by Cursor Bugbot for commit f36b3c3. This will update automatically on new commits. Configure here.