Replacing explicit backend search with api call #144944
Replacing explicit backend search with api call #144944AnantGulati wants to merge 4 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144944
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
❌ 🤖 pytorchbot command failed: |
|
@pytorchbot label "topic: not user facing" |
|
@kwen2501 could you please review and help land this PR. |
kwen2501
left a comment
There was a problem hiding this comment.
LGTM. We are very close to land once we clean up a couple points in the comments
There was a problem hiding this comment.
new line
Also, how would this setter interact with the world_size function above?
There was a problem hiding this comment.
It should interact with the above function such that it allows us to change world size for only one test case in the test file as shown here: https://github.com/pytorch/pytorch/pull/144944/files/25e3ab6c2c314457ef62b8e610f572e45e7724c7#r1919782578
There was a problem hiding this comment.
What is this new test for?
There was a problem hiding this comment.
It's not a new test, I have simply shifted it to the class TestCollectivesWithDistributedBackend, from it's original class as the only difference between the original class and TestCollectivesWithDistributedBackend is the changed world size.
This class was causing error while refactoring the code so I removed it and added the world size setter which will allow us to modify world size in common distributed.
There was a problem hiding this comment.
We have added this line to change the world size only in this test case, this removes the need of creating a new class just to change one property without impacting the other classes
There was a problem hiding this comment.
Do you mind reverting the setting of world_size here and restore the original class, OR
use world_size = 4 instead of self.world_size ?
Today the change may be fine, but in case class-based Test fixture is used, any setting to self in one test method would impact the rest tests, because class-based Test fixture will not recreate a new test object per test.
|
@pytorchbot merge |
|
Pull workflow has not been scheduled for the PR yet. It could be because author doesn't have permissions to run those or skip-checks keywords were added to PR/commits, aborting merge. Please get/give approval for the workflows and/or remove skip ci decorators before next merge attempt. If you think this is a mistake, please contact PyTorch Dev Infra. |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
4feeb3f to
a4e58d2
Compare
kwen2501
left a comment
There was a problem hiding this comment.
I am about to land this PR, but it seems one small change needs revert. Can you please help? Thank you!
There was a problem hiding this comment.
Do you mind reverting the setting of world_size here and restore the original class, OR
use world_size = 4 instead of self.world_size ?
Today the change may be fine, but in case class-based Test fixture is used, any setting to self in one test method would impact the rest tests, because class-based Test fixture will not recreate a new test object per test.
64d64cc to
2003796
Compare
Today the change may be fine, but in case class-based Test fixture is used, any setting to self in one test method would impact the rest tests, because class-based Test fixture will not recreate a new test object per test. I have maintained the initial class structure to avoid this problem. I have also changed the structure so that This is cleaner as the set up steps which were initially present in @kwen2501 I'd love to hear if you think any further adjustments are required. Thanks |
|
@kwen2501 Could you please help with this PR |
|
@kwen2501 all suggested changes have been added. Can we proceed with this PR? |
|
@kwen2501 Could you please take a look at this PR. It has been pending for some time now |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
2003796 to
ee5e207
Compare
|
@kwen2501 Could you please help with this review. All changes you suggested have been implemented |
|
Successfully rebased |
c08b452 to
89072b2
Compare
|
@pytorchmergebot merge -i |
Merge startedYour change will be merged while ignoring the following 0 checks: Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
|
@AnantGulati please resolve the conflicts. |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Rebase failed due to Command Raised by https://github.com/pytorch/pytorch/actions/runs/14922140404 |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Rebase failed due to Command Raised by https://github.com/pytorch/pytorch/actions/runs/15550541252 |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Following up on PR #138216
In this PR we are modifying the internal structure of common distributed class DistributedTestBase to use an inbuilt function get_default_backend_for_device (#140536).
We have also refactored test file test_functional_api to remove all explicit calls to test_hpu. to add support for a new device to this file a user must only add the device to list of devices. This supports both in-tree and out of tree devices
This PR also adds a setter for world size inside multiprocess test case. This allows for test cases to redefine world size for different test cases in the same class and removes the need to add a new class simply to change properties like world size as shown in test_functional_api
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @kwen2501 @c-p-i-o