Skip to content

[Feature Request] Add support for padding-free reward modeling training in TRL #3780

@zjjMaiMai

Description

@zjjMaiMai

Feature request

I would like to request support for padding-free reward modeling training in the HuggingFace TRL library, specifically in the RewardTrainer.

Currently, TRL's SFT trainer already supports padding-free training via position_ids, allowing for efficient fine-tuning on variable-length sequences. However, RewardTrainer still relies on padded inputs.

It would be very helpful if RewardTrainer could support a similar mechanism—by allowing user-supplied position_ids and attention_mask—to enable training without padding overhead.

Motivation

Reward modeling datasets typically involve prompt + response pairs of varying lengths. Padding these to a uniform length introduces significant memory and compute inefficiencies, especially for long sequences and large batch sizes.

TRL’s SFT trainer has addressed this via padding-free training using custom position_ids. Bringing this same capability to RewardTrainer would streamline fine-tuning pipelines and reduce resource waste. This is particularly important in large-scale setups or when using FlashAttention-based architectures.

Your contribution

Yes, I’d be happy to help by contributing a PR.

However, I believe this feature may require changes not only to TRL but also to HuggingFace Transformers—specifically in how forward() is implemented for SequenceClassification models. Many of these currently return only pooled_logits, which makes it difficult to apply position-aware token-level masking or sequence-level selection needed for padding-free reward modeling.

I’m happy to coordinate with the Transformers team or align with ongoing architectural changes to ensure compatibility.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions