DPOTrainer silently corrupts VLM training with "keep_end" truncation_mode

DPOTrainer silently corrupts VLM training when truncation_mode="keep_end" is set with a non-None max_length.

### Description

When `max_length` is set and `truncation_mode="keep_end"` (the non-default mode) is used with a vision-language model, DPOTrainer silently drops all image tokens from the sequence without any warning or error.

### Root cause

`keep_end` works by flushing padding to the right, taking the last `max_length` tokens, then flushing padding back to the left. In DPO, the sequence layout is [prompt (with image tokens) | completion]. The prompt (and therefore all image tokens) lives at the beginning of the sequence. `keep_end` removes everything from the beginning, so the model is trained without any image information while receiving `pixel_values` for a visual forward pass that is now entirely disconnected from the token sequence.

No error is raised. Training runs to completion and loss decreases, making this failure invisible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPOTrainer silently corrupts VLM training with "keep_end" truncation_mode #5285

Description

Root cause

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

DPOTrainer silently corrupts VLM training with "keep_end" truncation_mode #5285

Description

Description

Root cause

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions