Skip to content

DPOTrainer silently corrupts VLM training with "keep_end" truncation_mode #5285

@albertvillanova

Description

@albertvillanova

DPOTrainer silently corrupts VLM training when truncation_mode="keep_end" is set with a non-None max_length.

Description

When max_length is set and truncation_mode="keep_end" (the non-default mode) is used with a vision-language model, DPOTrainer silently drops all image tokens from the sequence without any warning or error.

Root cause

keep_end works by flushing padding to the right, taking the last max_length tokens, then flushing padding back to the left. In DPO, the sequence layout is [prompt (with image tokens) | completion]. The prompt (and therefore all image tokens) lives at the beginning of the sequence. keep_end removes everything from the beginning, so the model is trained without any image information while receiving pixel_values for a visual forward pass that is now entirely disconnected from the token sequence.

No error is raised. Training runs to completion and loss decreases, making this failure invisible.

Metadata

Metadata

Labels

No labels
No labels

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions