Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
This looks like a great change, love to see so many lines deleted.
I don't have experience with FSDP, so a few questions:
- Does this still work as expected when using PyTorch < 2.1?
use_orig_paramsdefault was changed to True. Is there any disadvantage to that, e.g. more memory usage?
|
Hello Benjamin,
accelerate/src/accelerate/accelerator.py Lines 306 to 307 in 0e51680
It is expected to become default as per the above dev blogpost:
|
Ah okay, I missed the version bump, thanks for pointing me to it.
Thanks for providing more context. My question arose because |
muellerzr
left a comment
There was a problem hiding this comment.
Nicely done @pacman100! Excellent refactor and loving that diff. Keeping the simplistic API all around is a phenomenal win!
BenjaminBossan
left a comment
There was a problem hiding this comment.
Great work, thanks Sourab.
What does this PR do?
FSDP refactogin based on:
use_orig_params=True, we no longer require preparingmodelbefore creating optimizer object. Earlier, we needed to prepare model, i.e., wrap the model with FSDP before creating optimizer object because of below warning from PyTorch official docs:Now, with
use_orig_params=True, it is no longer the case. This makes the Accelerate training APi consistent, i.e., users using single GPU, DDP, FSDP, DeepSpeed now need to follow the same logic as below:Earlier, for FSDP, the recommended practice was shown as below. Else we used to receate the optimizer post preparing the model and it didn't preserve optimizer groups. Now, all that is resolved. Now, optimizer groups are also supported.
As such,
use_orig_params=Trueis now the default.FULL_STATE_DICTandSHARDED_STATE_DICT. We are also supporting both of these and already have tests for it. They don't show how to save and load forLOCAL_STATE_DICTstate dict type.LOCAL_STATE_DICTcheckpointing feature of FSDP is now failing. Couldn't find anything about it in llama recipes, FSDP documentation, torch FSDP codebase https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp as well as on the internet. Will raise an issue with PyTorch team regarding it.