It seems that RewardTrainer does not use num_items_in_batch in the loss calculation. According to the transformers Trainer documentation, self.model_accepts_loss_kwargs = False must be set to ensure the loss is computed correctly when performing gradient accumulation?
trl/trl/trainer/reward_trainer.py
Line 250 in 7e20753
It seems that
RewardTrainerdoes not usenum_items_in_batchin the loss calculation. According to thetransformersTrainerdocumentation,self.model_accepts_loss_kwargs = Falsemust be set to ensure the loss is computed correctly when performing gradient accumulation?