[DTensor] Test pointwise partial propagation#174000
[DTensor] Test pointwise partial propagation#174000wconstab wants to merge 3 commits intogh/wconstab/519/basefrom
Conversation
Add a bunch of missing test cases. These were all 'missing' in my single-sim poitnwise PR (discovered missing using the sharding validator). The pointwise rule on main (non-single-dim) is not missing these, but we lack the test coverage. Landing this first helps ensure we don't regress when refactoring. [ghstack-poisoned]
This PR needs a
|
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174000
Note: Links to docs will display an error until the docs builds have been completed. ❌ 4 New Failures, 1 Unrelated FailureAs of commit feb081b with merge base 4b0f7fb ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| if dist.is_initialized(): | ||
| dist.destroy_process_group() | ||
|
|
||
| def test_add_partial_max_with_replicate(self): |
There was a problem hiding this comment.
Couldnt almost all these tests be parameterized?
| Mathematically: max(a0, a1) + r = max(a0 + r, a1 + r) | ||
| Adding a replicated constant to P(max) preserves the max structure. | ||
| """ | ||
| with LocalTensorMode(frozenset(range(self.world_size))): |
There was a problem hiding this comment.
Feels like this could be a function decorator or be part of setup/teardown
Add a bunch of missing test cases. These were all 'missing' in my single-sim poitnwise PR (discovered missing using the sharding validator). The pointwise rule on main (non-single-dim) is not missing these, but we lack the test coverage. Landing this first helps ensure we don't regress when refactoring. [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
Add a bunch of missing test cases. These were all 'missing' in my single-sim poitnwise PR (discovered missing using the sharding validator). The pointwise rule on main (non-single-dim) is not missing these, but we lack the test coverage. Landing this first helps ensure we don't regress when refactoring. [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
failures on cuda and rocm test/distributed/tensor/test_pointwise_ops.py::PointwisePartialsTest::test_sub_replicate_partial_min_gives_partial_max GH job link HUD commit link |
|
@pytorchbot revert -c nosignal -m "test/distributed/tensor/test_pointwise_ops.py::PointwisePartialsTest::test_sub_replicate_partial_min_gives_partial_max GH job link HUD commit link" |
|
@pytorchbot successfully started a revert job. Check the current status here. |
|
@wconstab your PR has been successfully reverted. |
This reverts commit e2d81fb. Reverted #174000 on behalf of https://github.com/jeffdaily due to test/distributed/tensor/test_pointwise_ops.py::PointwisePartialsTest::test_sub_replicate_partial_min_gives_partial_max [GH job link](https://github.com/pytorch/pytorch/actions/runs/21617187245/job/62324663867) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/e2d81fb2c5d8119b55e8e812da9229320928c16d) ([comment](#174000 (comment)))
|
hud link for failures |
|
Thanks for revert @jeffdaily ! |
@malfet, yes, this is a case when a new broken test is added ( |
|
closing as @anshul-si will take over |
Add a bunch of missing test cases. These were all 'missing' in my single-sim poitnwise PR (discovered missing using the sharding validator). The pointwise rule on main (non-single-dim) is not missing these, but we lack the test coverage. Landing this first helps ensure we don't regress when refactoring. Pull Request resolved: pytorch#174000 Approved by: https://github.com/Skylion007
This reverts commit e2d81fb. Reverted pytorch#174000 on behalf of https://github.com/jeffdaily due to test/distributed/tensor/test_pointwise_ops.py::PointwisePartialsTest::test_sub_replicate_partial_min_gives_partial_max [GH job link](https://github.com/pytorch/pytorch/actions/runs/21617187245/job/62324663867) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/e2d81fb2c5d8119b55e8e812da9229320928c16d) ([comment](pytorch#174000 (comment)))
Add a bunch of missing test cases. These were all 'missing' in my single-sim poitnwise PR (discovered missing using the sharding validator). The pointwise rule on main (non-single-dim) is not missing these, but we lack the test coverage. Landing this first helps ensure we don't regress when refactoring. Pull Request resolved: pytorch#174000 Approved by: https://github.com/Skylion007
This reverts commit e2d81fb. Reverted pytorch#174000 on behalf of https://github.com/jeffdaily due to test/distributed/tensor/test_pointwise_ops.py::PointwisePartialsTest::test_sub_replicate_partial_min_gives_partial_max [GH job link](https://github.com/pytorch/pytorch/actions/runs/21617187245/job/62324663867) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/e2d81fb2c5d8119b55e8e812da9229320928c16d) ([comment](pytorch#174000 (comment)))
Add a bunch of missing test cases. These were all 'missing' in my single-sim poitnwise PR (discovered missing using the sharding validator). The pointwise rule on main (non-single-dim) is not missing these, but we lack the test coverage. Landing this first helps ensure we don't regress when refactoring. ghstack-source-id: 51dc560 Pull Request resolved: pytorch/pytorch#174000
Stack from ghstack (oldest at bottom):
Add a bunch of missing test cases.
These were all 'missing' in my single-sim poitnwise PR (discovered
missing using the sharding validator). The pointwise rule on main
(non-single-dim) is not missing these, but we lack the test coverage.
Landing this first helps ensure we don't regress when refactoring.