Loss calculation of `RewardTrainer` may be inaccurate when performing gradient accumulation?

https://github.com/huggingface/trl/blob/7e2075347e6c5dc648888fe5da6aee5e2f7f443f/trl/trainer/reward_trainer.py#L250

It seems that `RewardTrainer` does not use `num_items_in_batch` in the loss calculation. According to the `transformers` `Trainer` documentation, `self.model_accepts_loss_kwargs = False` must be set to ensure the loss is computed correctly when performing gradient accumulation?