Skip to content

Adapt dtensor tests to be device agnostic#154840

Closed
amathewc wants to merge 1 commit intopytorch:mainfrom
amathewc:dtensor5
Closed

Adapt dtensor tests to be device agnostic#154840
amathewc wants to merge 1 commit intopytorch:mainfrom
amathewc:dtensor5

Conversation

@amathewc
Copy link
Contributor

@amathewc amathewc commented Jun 2, 2025

##MOTIVATION
This PR includes minor changes to skip some unsupported tests on Intel Gaudi devices as well as to make some of the tests more device agnostic.
Please refer to this RFC as well: pytorch/rfcs#66

##CHANGES

  • test_dtensor_compile.py : Make some of the tests device agnostic . ( Replace "cuda" hard codings with self.device_type)
  • test_dtensor.py and test_comm_mode_features.py: Skip some tests which are unsupported on Intel Gaudi devices.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k, @ankurneog, @EikanWang, @guangyey

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154840

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 2b5c4e2 with merge base 2908c10 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category labels Jun 2, 2025
@amathewc
Copy link
Contributor Author

amathewc commented Jun 2, 2025

@pytorchbot label "topic: not user facing"

@Skylion007
Copy link
Collaborator

Should these be skipifXPU or should it be xfailIfXPU? We prefer the latter so we can enable them later when the functionality is fixed.

@soulitzer soulitzer requested a review from bdhirsh June 2, 2025 15:58
@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 2, 2025
@EikanWang
Copy link
Collaborator

This PR should be dedicated to Intel Gaudi. For Intel GPU(XPU), we have supported the feature. @zhangxiaoli73

@amathewc
Copy link
Contributor Author

amathewc commented Jun 3, 2025

This PR should be dedicated to Intel Gaudi. For Intel GPU(XPU), we have supported the feature. @zhangxiaoli73

Yes - this is specific for Intel Gaudi (HPU) devices.

@amathewc
Copy link
Contributor Author

amathewc commented Jun 3, 2025

@albanD , @atalman : Could you help with merging this ?

@amathewc
Copy link
Contributor Author

amathewc commented Jun 6, 2025

@albanD , @atalman : Could you help with merging this ? The failures seem to be unrelated to this PR.

@amathewc
Copy link
Contributor Author

amathewc commented Jun 9, 2025

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

Signed-off-by: Aby Mathew C <aby.mathew.c@intel.com>
@pytorchmergebot
Copy link
Collaborator

Successfully rebased dtensor5 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout dtensor5 && git pull --rebase)

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@amathewc
Copy link
Contributor Author

Sure

@albanD : Could you initiate the merging as well ?

@guangyey
Copy link
Collaborator

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 10, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jun 21, 2025
## MOTIVATION
This PR is a continuation of #154840 and we are trying to make the tests more device agnostic by removing hard coded references to any particular device.
Please refer to this RFC as well: pytorch/rfcs#66

## CHANGES
1. test_convolution_ops.py:
    - Replace "cuda" with self.device_type
2. test_random_ops.py:
    - Remove setting and using TYPE_DEVICE variable since device_type is set as per the environment (device) in DTensorTestBase class.
    - Replace "cuda" with self.device_type

Pull Request resolved: #155687
Approved by: https://github.com/EikanWang, https://github.com/d4l3k
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants