port distributed pipeline test files for Intel GPU#159033
port distributed pipeline test files for Intel GPU#159033wincent8 wants to merge 3 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159033
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 28da538 with merge base 67fc16c ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "module: xpu" |
|
@pytorchbot label "triaged" |
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
@pytorchbot label "module: xpu" |
|
@pytorchbot label "triaged" |
| devices = ["cpu", "cuda", "hpu", "xpu"] | ||
| instantiate_device_type_tests(UnflattenTests, globals(), only_for=devices) | ||
| instantiate_device_type_tests( | ||
| UnflattenTests, globals(), only_for=devices, allow_xpu=True |
There was a problem hiding this comment.
Why the need for adding allow_xpu=True ?
There was a problem hiding this comment.
If no allow_xpu=True, these test cases will not be instantiated actually refer to
pytorch/torch/testing/_internal/common_device_type.py
Lines 789 to 796 in 74280d0
guangyey
left a comment
There was a problem hiding this comment.
Overall LGTM. I recommend to change TEST_MULTIGPU to TEST_MULTIACCELERATOR
|
@d4l3k May I know if the internal CI is green? |
|
@wincent8, please fix conflicts. |
done |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Rebase failed due to Command Raised by https://github.com/pytorch/pytorch/actions/runs/17144110931 |
9c4b7e0 to
b13f259
Compare
kwen2501
left a comment
There was a problem hiding this comment.
lgtm.
left some comments on whether we can make the backend strings go away.
| devices = ["cpu", "cuda", "hpu", "xpu"] | ||
| instantiate_device_type_tests(UnflattenTests, globals(), only_for=devices) | ||
| instantiate_device_type_tests( | ||
| UnflattenTests, globals(), only_for=devices, allow_xpu=True |
There was a problem hiding this comment.
Why the need for adding allow_xpu=True ?
| @requires_nccl() | ||
| @requires_accelerator_dist_backend(["nccl", "xccl"]) |
There was a problem hiding this comment.
As far as I know, requires_accelerator_dist_backend accepts None as argument, and it will basically search the same list of strings. I would prefer to leave it empty for less maintenance load.
There was a problem hiding this comment.
This is done explicitly to avoid running these on MTIA which has issues with these tests
|
I kicked off a land -- this should be merged soon so just sit tight |
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: Meta Internal-Only Changes Check Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge -f 'merged internally' |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
In this PR we will port all distributed pipeline test files.
We could enable Intel GPU with following methods and try the best to keep the original code styles:
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @gujinghui @fengyuan14 @guangyey