Open up PT UTs to cover additional devices by ankurneog · Pull Request #145589 · pytorch/pytorch

ankurneog · 2025-01-24T05:51:12Z

This is follow-up of #128584. Covering additional files for execution.

Based on the discussion we further had with the reviewers it is decided to remove onlyNativeDeviceTypes decorator to open these up for all devices.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

pytorch-bot · 2025-01-24T05:51:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145589

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 12 New Failures

As of commit 0427369 with merge base 9c9b05b ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner-noclang / linux-job (gh)
>>> Lint for test/test_optim.py:
pull / linux-focal-cuda12.6-py3.10-gcc9 / test (default, 1, 5, lf.linux.4xlarge.nvidia.gpu) (gh)
RuntimeError: test_nn 1/1 failed!
pull / linux-focal-cuda12.6-py3.10-gcc9-sm89 / test (default, 1, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) (gh)
RuntimeError: test_nn 1/1 failed!
pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge) (gh)
ModuleNotFoundError: No module named 'torch.version'
pull / linux-focal-py3.13-clang10 / test (crossref, 1, 2, lf.linux.2xlarge) (gh)
RuntimeError: test_nn 1/1 failed!
pull / linux-focal-py3.13-clang10 / test (default, 1, 5, lf.linux.4xlarge) (gh)
RuntimeError: test_nn 1/1 failed!
pull / linux-focal-py3.13-clang10 / test (dynamo_wrapped, 1, 3, lf.linux.2xlarge) (gh)
RuntimeError: test_nn 1/2 failed!
pull / linux-focal-py3.9-clang10 / test (crossref, 1, 2, lf.linux.2xlarge) (gh)
RuntimeError: test_nn 1/1 failed!
pull / linux-focal-py3.9-clang10 / test (default, 1, 5, lf.linux.4xlarge) (gh)
RuntimeError: test_nn 1/1 failed!
pull / linux-focal-py3.9-clang10 / test (dynamo_wrapped, 1, 3, lf.linux.2xlarge) (gh)
RuntimeError: test_nn 1/1 failed!
pull / linux-jammy-py3.10-clang15-asan / test (default, 1, 6, lf.linux.4xlarge) (gh)
RuntimeError: test_nn 1/1 failed!
pull / linux-jammy-py3.9-gcc11 / test (default, 1, 5, lf.linux.2xlarge) (gh)
RuntimeError: test_nn 1/1 failed!

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ankurneog · 2025-02-04T04:04:43Z

@albanD , @kwen2501 : can you please help with the review and approval , thank you. ( this is lines of the changes introduced in #128584)

albanD

It's still quite weird to have this in core given that we don't run this in CI...
Sounds fine for me since it's only test but I wouldn't expect this to remain working for any period of time tbh.

cc @malfet in case you have an opinion on this

ankurneog · 2025-02-10T04:03:02Z

It's still quite weird to have this in core given that we don't run this in CI... Sounds fine for me since it's only test but I wouldn't expect this to remain working for any period of time tbh.

cc @malfet in case you have an opinion on this

@albanD : Thanks for your comment, these TCs are disabled by default for other devices (by the existing decorator) so when we want to check them for devices such as Intel Gaudi , we need to do the refactoring or remove the decorators. The change removes this restriction for devices that support the tests and want to run them.

malfet · 2025-02-10T04:07:18Z

@ankurneog regarding the review: current PR conflicts with trunk and causes 21+ workflow failures....

malfet · 2025-02-10T05:03:42Z

cc @malfet in case you have an opinion on this

It feels like onlyNativeDeviceType(introduced by #65201 and comment still says it's only CUDA + CPU + Meta, while definition has evolved) served a very specific purpose at times, which seems lost now. IMO all use of this decorator should be replaced (when possible) with skipIfXLA (and probably a few skipIfMPS)

albanD · 2025-02-10T16:46:31Z

@ankurneog thanks for the details!
I guess the usual way we solve this kind of issues is by providing a generic extension point such that each vendor can customize the logic for their need without requiring their special logic to be in core.
I wonder if it would be possible here as well to have these hpu-specific skip list in your own codebase and pytorch provides only the extension point to make that convenient (if any is needed).

ankurneog · 2025-02-11T04:48:59Z

@ankurneog thanks for the details! I guess the usual way we solve this kind of issues is by providing a generic extension point such that each vendor can customize the logic for their need without requiring their special logic to be in core. I wonder if it would be possible here as well to have these hpu-specific skip list in your own codebase and pytorch provides only the extension point to make that convenient (if any is needed).

@albanD : Sure that's doable , in that case can i remove the existing onlyNativeDeviceType decorator for the tests?

ankurneog · 2025-02-11T04:51:12Z

cc @malfet in case you have an opinion on this

It feels like onlyNativeDeviceType(introduced by #65201 and comment still says it's only CUDA + CPU + Meta, while definition has evolved) served a very specific purpose at times, which seems lost now. IMO all use of this decorator should be replaced (when possible) with skipIfXLA (and probably a few skipIfMPS)

@malfet : thanks for your comment, then maybe we should remove the onlyNativeDeviceTypes decorator from these tests ?

malfet · 2025-02-11T05:49:13Z

@ankurneog hanks for your comment, then maybe we should remove the onlyNativeDeviceTypes decorator from these tests ?

If they pass CI, than yes, feels like a right approach

ankurneog · 2025-02-17T02:27:53Z

@malfet , @albanD : can you help with the review and approval. Thanks

albanD

Removal sounds good!
CI failures look real though.

linux-foundation-easycla · 2025-03-06T04:59:17Z

The committers listed above are authorized under a signed CLA.

✅ login: ankurneog / name: Ankur Neog (6e0604b, f574bec, 35ad062, 0427369, 8b4eca5, 0451d84, 5bec8d6)

ankurneog · 2025-03-07T07:48:10Z

@pytorchbot rebase

pytorchmergebot · 2025-03-07T07:49:38Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-03-07T07:49:42Z

Successfully rebased expand_ops_execution onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout expand_ops_execution && git pull --rebase)

github-actions · 2025-05-06T13:41:38Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

ankurneog requested review from IvanYashchuk, lezcano, mruberry and nikitaved as code owners January 24, 2025 05:51

pytorch-bot bot added module: inductor topic: not user facing topic category labels Jan 24, 2025

pytorchbot added the open source label Jan 24, 2025

ankurneog force-pushed the expand_ops_execution branch from c736712 to a818d12 Compare January 24, 2025 06:14

zou3519 requested a review from jbschlosser January 24, 2025 16:25

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 24, 2025

ankurneog force-pushed the expand_ops_execution branch from a818d12 to df646d7 Compare January 31, 2025 04:15

albanD reviewed Feb 5, 2025

View reviewed changes

ankurneog force-pushed the expand_ops_execution branch 3 times, most recently from f0d4375 to 07c0e39 Compare February 17, 2025 02:26

ankurneog changed the title ~~Replace decorators in UTs to cover additional devices~~ Open up PT UTs to cover additional devices Feb 17, 2025

ankurneog requested a review from albanD February 19, 2025 03:49

ankurneog force-pushed the expand_ops_execution branch from 07c0e39 to 2666c06 Compare February 21, 2025 03:16

albanD reviewed Feb 21, 2025

View reviewed changes

ankurneog mentioned this pull request Mar 5, 2025

[RFC] Generalize pytorch content for non-native device execution pytorch/rfcs#66

Open

ankurneog force-pushed the expand_ops_execution branch from 2666c06 to a1005b1 Compare March 6, 2025 04:59

ankurneog added 7 commits March 7, 2025 07:49

address review comments

f574bec

address ci failures

35ad062

address lint issues

0451d84

fix ci failures

6e0604b

skip failed xla and lazy tests

5bec8d6

restore test_view_ops.py due to multiple XLA/LAZY failures

8b4eca5

revert test_view_ops.py

0427369

pytorchmergebot force-pushed the expand_ops_execution branch from 088f8cd to 0427369 Compare March 7, 2025 07:49

github-actions bot added the Stale label May 6, 2025

github-actions bot closed this Jun 5, 2025

Conversation

ankurneog commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145589

❌ 12 New Failures

Uh oh!

ankurneog commented Feb 4, 2025

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

ankurneog commented Feb 10, 2025

Uh oh!

malfet commented Feb 10, 2025

Uh oh!

malfet commented Feb 10, 2025 • edited by albanD Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albanD commented Feb 10, 2025

Uh oh!

ankurneog commented Feb 11, 2025

Uh oh!

ankurneog commented Feb 11, 2025

Uh oh!

malfet commented Feb 11, 2025

Uh oh!

ankurneog commented Feb 17, 2025

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

linux-foundation-easycla bot commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ankurneog commented Mar 7, 2025

Uh oh!

pytorchmergebot commented Mar 7, 2025

Uh oh!

pytorchmergebot commented Mar 7, 2025

Uh oh!

github-actions bot commented May 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ankurneog commented Jan 24, 2025 •

edited

Loading

pytorch-bot bot commented Jan 24, 2025 •

edited

Loading

malfet commented Feb 10, 2025 •

edited by albanD

Loading

linux-foundation-easycla bot commented Mar 6, 2025 •

edited

Loading