Skip to content

PPO example not working with DeepSpeed Stage 3 or FSDP #1051

@mgerstgrasser

Description

@mgerstgrasser

I've been trying to get a PPO trainer to work with fully sharded training using either DeepSpeed stage 3 or FSDP. However, no matter what exact configuration options I try, I cannot get even the example in the documentation to work. It seems the problems are with calling trainer.generate() when sampling a rollout. With FSDP, it usually crashes, with the exact error message depending on exact accelerate config (e.g. pytorch/pytorch#82461 ) With DeepSpeed, the script seems to just hang and time out, without an error message.

Is this known behavior, and is there a working example or documentation of PPO + Deepspeed/FSDP anywhere?

To reproduce, inside examples:
accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/ppo.py
or even accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml helloworld.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions