Skip to content

Open up PT UTs to cover additional devices#145589

Closed
ankurneog wants to merge 7 commits intopytorch:mainfrom
ankurneog:expand_ops_execution
Closed

Open up PT UTs to cover additional devices#145589
ankurneog wants to merge 7 commits intopytorch:mainfrom
ankurneog:expand_ops_execution

Conversation

@ankurneog
Copy link

@ankurneog ankurneog commented Jan 24, 2025

This is follow-up of #128584. Covering additional files for execution.

Based on the discussion we further had with the reviewers it is decided to remove onlyNativeDeviceTypes decorator to open these up for all devices.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145589

Note: Links to docs will display an error until the docs builds have been completed.

❌ 12 New Failures

As of commit 0427369 with merge base 9c9b05b (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@ankurneog ankurneog force-pushed the expand_ops_execution branch from c736712 to a818d12 Compare January 24, 2025 06:14
@zou3519 zou3519 requested a review from jbschlosser January 24, 2025 16:25
@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 24, 2025
@ankurneog ankurneog force-pushed the expand_ops_execution branch from a818d12 to df646d7 Compare January 31, 2025 04:15
@ankurneog
Copy link
Author

@albanD , @kwen2501 : can you please help with the review and approval , thank you. ( this is lines of the changes introduced in #128584)

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still quite weird to have this in core given that we don't run this in CI...
Sounds fine for me since it's only test but I wouldn't expect this to remain working for any period of time tbh.

cc @malfet in case you have an opinion on this

@ankurneog
Copy link
Author

It's still quite weird to have this in core given that we don't run this in CI... Sounds fine for me since it's only test but I wouldn't expect this to remain working for any period of time tbh.

cc @malfet in case you have an opinion on this

@albanD : Thanks for your comment, these TCs are disabled by default for other devices (by the existing decorator) so when we want to check them for devices such as Intel Gaudi , we need to do the refactoring or remove the decorators. The change removes this restriction for devices that support the tests and want to run them.

@malfet
Copy link
Contributor

malfet commented Feb 10, 2025

@ankurneog regarding the review: current PR conflicts with trunk and causes 21+ workflow failures....

@malfet
Copy link
Contributor

malfet commented Feb 10, 2025

cc @malfet in case you have an opinion on this

It feels like onlyNativeDeviceType(introduced by #65201 and comment still says it's only CUDA + CPU + Meta, while definition has evolved) served a very specific purpose at times, which seems lost now. IMO all use of this decorator should be replaced (when possible) with skipIfXLA (and probably a few skipIfMPS)

@albanD
Copy link
Collaborator

albanD commented Feb 10, 2025

@ankurneog thanks for the details!
I guess the usual way we solve this kind of issues is by providing a generic extension point such that each vendor can customize the logic for their need without requiring their special logic to be in core.
I wonder if it would be possible here as well to have these hpu-specific skip list in your own codebase and pytorch provides only the extension point to make that convenient (if any is needed).

@ankurneog
Copy link
Author

@ankurneog thanks for the details! I guess the usual way we solve this kind of issues is by providing a generic extension point such that each vendor can customize the logic for their need without requiring their special logic to be in core. I wonder if it would be possible here as well to have these hpu-specific skip list in your own codebase and pytorch provides only the extension point to make that convenient (if any is needed).

@albanD : Sure that's doable , in that case can i remove the existing onlyNativeDeviceType decorator for the tests?

@ankurneog
Copy link
Author

cc @malfet in case you have an opinion on this

It feels like onlyNativeDeviceType(introduced by #65201 and comment still says it's only CUDA + CPU + Meta, while definition has evolved) served a very specific purpose at times, which seems lost now. IMO all use of this decorator should be replaced (when possible) with skipIfXLA (and probably a few skipIfMPS)

@malfet : thanks for your comment, then maybe we should remove the onlyNativeDeviceTypes decorator from these tests ?

@malfet
Copy link
Contributor

malfet commented Feb 11, 2025

@ankurneog hanks for your comment, then maybe we should remove the onlyNativeDeviceTypes decorator from these tests ?

If they pass CI, than yes, feels like a right approach

@ankurneog ankurneog force-pushed the expand_ops_execution branch 3 times, most recently from f0d4375 to 07c0e39 Compare February 17, 2025 02:26
@ankurneog
Copy link
Author

@malfet , @albanD : can you help with the review and approval. Thanks

@ankurneog ankurneog changed the title Replace decorators in UTs to cover additional devices Open up PT UTs to cover additional devices Feb 17, 2025
@ankurneog ankurneog requested a review from albanD February 19, 2025 03:49
@ankurneog ankurneog force-pushed the expand_ops_execution branch from 07c0e39 to 2666c06 Compare February 21, 2025 03:16
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal sounds good!
CI failures look real though.

@ankurneog ankurneog force-pushed the expand_ops_execution branch from 2666c06 to a1005b1 Compare March 6, 2025 04:59
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Mar 6, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@ankurneog
Copy link
Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased expand_ops_execution onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout expand_ops_execution && git pull --rebase)

@github-actions
Copy link
Contributor

github-actions bot commented May 6, 2025

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label May 6, 2025
@github-actions github-actions bot closed this Jun 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: inductor open source Stale topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants