I've been trying to get a PPO trainer to work with fully sharded training using either DeepSpeed stage 3 or FSDP. However, no matter what exact configuration options I try, I cannot get even the example in the documentation to work. It seems the problems are with calling trainer.generate() when sampling a rollout. With FSDP, it usually crashes, with the exact error message depending on exact accelerate config (e.g. pytorch/pytorch#82461 ) With DeepSpeed, the script seems to just hang and time out, without an error message.
Is this known behavior, and is there a working example or documentation of PPO + Deepspeed/FSDP anywhere?
To reproduce, inside examples:
accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/ppo.py
or even accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml helloworld.py
I've been trying to get a PPO trainer to work with fully sharded training using either DeepSpeed stage 3 or FSDP. However, no matter what exact configuration options I try, I cannot get even the example in the documentation to work. It seems the problems are with calling
trainer.generate()when sampling a rollout. With FSDP, it usually crashes, with the exact error message depending on exact accelerate config (e.g. pytorch/pytorch#82461 ) With DeepSpeed, the script seems to just hang and time out, without an error message.Is this known behavior, and is there a working example or documentation of PPO + Deepspeed/FSDP anywhere?
To reproduce, inside
examples:accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/ppo.pyor even
accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml helloworld.py