DPOTrainer silently corrupts VLM training when truncation_mode="keep_end" is set with a non-None max_length.
Description
When max_length is set and truncation_mode="keep_end" (the non-default mode) is used with a vision-language model, DPOTrainer silently drops all image tokens from the sequence without any warning or error.
Root cause
keep_end works by flushing padding to the right, taking the last max_length tokens, then flushing padding back to the left. In DPO, the sequence layout is [prompt (with image tokens) | completion]. The prompt (and therefore all image tokens) lives at the beginning of the sequence. keep_end removes everything from the beginning, so the model is trained without any image information while receiving pixel_values for a visual forward pass that is now entirely disconnected from the token sequence.
No error is raised. Training runs to completion and loss decreases, making this failure invisible.
DPOTrainer silently corrupts VLM training when truncation_mode="keep_end" is set with a non-None max_length.
Description
When
max_lengthis set andtruncation_mode="keep_end"(the non-default mode) is used with a vision-language model, DPOTrainer silently drops all image tokens from the sequence without any warning or error.Root cause
keep_endworks by flushing padding to the right, taking the lastmax_lengthtokens, then flushing padding back to the left. In DPO, the sequence layout is [prompt (with image tokens) | completion]. The prompt (and therefore all image tokens) lives at the beginning of the sequence.keep_endremoves everything from the beginning, so the model is trained without any image information while receivingpixel_valuesfor a visual forward pass that is now entirely disconnected from the token sequence.No error is raised. Training runs to completion and loss decreases, making this failure invisible.