-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
For cosine learning rate schedules, the learning rate should approach zero when the training ends. However, due to accelerator prepare logic a learning rate scheduler will be stepped N times for each optimization step when there are N GPUs. Thus, we need to multiply the steps by num_processes when constructing schedulers.
Sample learning rate curve running dreambooth example on 4 GPUs:
Reproduction
Run the example scripts passing cosine learning rate schedules.
Logs
No response
System Info
diffusersversion: 0.17.1- Platform: Linux-5.4.0-150-generic-x86_64-with-debian-buster-sid
- Python version: 3.7.13
- PyTorch version (GPU?): 1.12.1 (True)
- Huggingface_hub version: 0.15.1
- Transformers version: 4.30.2
- Accelerate version: 0.20.3
- xFormers version: 0.0.20+1dc3d7a.d20230628
- Using GPU in script?: Y
- Using distributed or parallel set-up in script?: DDP
Who can help?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
