Generalization of FSDP common for non-cuda execution#133209
Generalization of FSDP common for non-cuda execution#133209ankurneog wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133209
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 55cdd62 with merge base 76b044d ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
4b7e235 to
c12afd5
Compare
|
@wconstab who would have some bandwidth to do these FSDP update to make it work with non-cuda devices? |
|
@wconstab : Can you please help with the review and approval, Main changes here : https://github.com/pytorch/pytorch/pull/133209/files#diff-7b5c66f99161fa6a3d9042e80f8c8cc140a64e43445feede46f55e53154f6c3d (torch/testing/_internal/common_fsdp.py), others are mostly for adapting to change made in that file |
There was a problem hiding this comment.
just curious, why the difference?
There was a problem hiding this comment.
+1, please help us unify code paths to reduce maintenance load.
wconstab
left a comment
There was a problem hiding this comment.
mostly LGTM. a couple of small questions and then i would stamp.
Also, fwiw you may want to look into FSDP2 support as well, if you haven't.
There was a problem hiding this comment.
todo- is this actually unchanged in the new formula?
as long as TEST_CUDA := torch.cuda.is_available(), then DEVICE_COUNT := torch.cuda.device_count() so this appears equivalent.
There was a problem hiding this comment.
is this a behavior change? were we relying on the 'set_device' for cuda to cover this before and now we make it explicit?
c12afd5 to
a8d9bee
Compare
|
@wconstab : thank you for your review comments, I have addressed them and also replied to your queries. could you please have a look and help with the approval. thank you. |
|
@wconstab : gentle reminder , could you kindly do the needful, thanks. |
|
Nice effort. |
a8d9bee to
d94ed2e
Compare
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
d94ed2e to
ab54f83
Compare
d05c785 to
ee5db28
Compare
Merge failedReason: 3 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
96817f5 to
7ec516d
Compare
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
7ec516d to
55cdd62
Compare
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
| if TEST_HPU: | ||
| time.sleep(self.delay_before_reduction_ms / 1000) | ||
| elif TEST_CUDA: | ||
| torch.cuda._sleep(int(self.delay_after_loss_ms * get_cycles_per_ms())) |
There was a problem hiding this comment.
why is HPU sleeping based on delay_before_reduction_ms instead of delay_after_loss_ms?
There was a problem hiding this comment.
this one I just added is new
|
I am kind of confused why some of the comments left by previous reviewers were left unaddressed or without response. |
Could you please let me know what needs to be addressed? |
Motivation: Generalize unit tests so that can be executed for cuda and non cuda devices. Depedency : #133209 Merged now. There was a #135242 for these changes and closed due to in correct commits. I have incoroprated the changes as suggested in comments. @kwen2501 @zeshengzong Please review the changes. Pull Request resolved: #139184 Approved by: https://github.com/kwen2501 Co-authored-by: Yu, Guangye <guangye.yu@intel.com>
Motivation: Generalize unit tests so that can be executed for cuda and non cuda devices. Depedency : #133209 Merged now. There was a #135242 for these changes and closed due to in correct commits. I have incoroprated the changes as suggested in comments. @kwen2501 @zeshengzong Please review the changes. Pull Request resolved: #139184 Approved by: https://github.com/kwen2501 Co-authored-by: Yu, Guangye <guangye.yu@intel.com>
Motivation
The FSDP common code for FSDP UT execution is mostly written with cuda device in mind. However other devices such the intel Gaudi supports most of the functionality. We are generalizing the base content so that the UT content can be used for non-cuda device execution.
cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o