PPO example not working with DeepSpeed Stage 3 or FSDP

I've been trying to get a PPO trainer to work with fully sharded training using either DeepSpeed stage 3 or FSDP. However, no matter what exact configuration options I try, I cannot get even the example in the documentation to work. It seems the problems are with calling `trainer.generate()` when sampling a rollout. With FSDP, it usually crashes, with the exact error message depending on exact accelerate config (e.g. https://github.com/pytorch/pytorch/issues/82461 ) With DeepSpeed, the script seems to just hang and time out, without an error message.

Is this known behavior, and is there a working example or documentation of PPO + Deepspeed/FSDP anywhere?

To reproduce, inside `examples`:
`accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/ppo.py`
or even `accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml helloworld.py`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO example not working with DeepSpeed Stage 3 or FSDP #1051

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

PPO example not working with DeepSpeed Stage 3 or FSDP #1051

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions