[Feature Request] Add support for padding-free reward modeling training in TRL

### Feature request

I would like to request support for padding-free reward modeling training in the HuggingFace TRL library, specifically in the RewardTrainer.

Currently, TRL's SFT trainer already supports padding-free training via position_ids, allowing for efficient fine-tuning on variable-length sequences. However, RewardTrainer still relies on padded inputs.

It would be very helpful if RewardTrainer could support a similar mechanism—by allowing user-supplied position_ids and attention_mask—to enable training without padding overhead.

### Motivation

Reward modeling datasets typically involve prompt + response pairs of varying lengths. Padding these to a uniform length introduces significant memory and compute inefficiencies, especially for long sequences and large batch sizes.

TRL’s SFT trainer has addressed this via padding-free training using custom position_ids. Bringing this same capability to RewardTrainer would streamline fine-tuning pipelines and reduce resource waste. This is particularly important in large-scale setups or when using FlashAttention-based architectures.

### Your contribution

Yes, I’d be happy to help by contributing a PR.

However, I believe this feature may require changes not only to TRL but also to HuggingFace Transformers—specifically in how forward() is implemented for SequenceClassification models. Many of these currently return only pooled_logits, which makes it difficult to apply position-aware token-level masking or sequence-level selection needed for padding-free reward modeling.

I’m happy to coordinate with the Transformers team or align with ongoing architectural changes to ensure compatibility.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add support for padding-free reward modeling training in TRL #3780

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature Request] Add support for padding-free reward modeling training in TRL #3780

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions