Hi! I've been attempting to run examples/scripts/ppo/ppo_tldr.py and encountered an "TypeError: 'float' object cannot be interpreted as an integer" on the line of for update in range(1, args.num_updates + 1). After reading ppov2_trainer.py, I think the underlying issue originates from:
#########
# calculate various batch sizes
#########
if args.total_episodes is None: # allow the users to define episodes in terms of epochs.
args.total_episodes = args.num_train_epochs * self.train_dataset_len
num_train_epochs is an float argument in TraininingArguments, which PPOv2Config inherits. I suppose we should use args.num_ppo_epochs here.
Hi! I've been attempting to run
examples/scripts/ppo/ppo_tldr.pyand encountered an "TypeError: 'float' object cannot be interpreted as an integer" on the line offor update in range(1, args.num_updates + 1). After readingppov2_trainer.py, I think the underlying issue originates from:num_train_epochs is an float argument in
TraininingArguments, which PPOv2Config inherits. I suppose we should use args.num_ppo_epochs here.