Skip to content

[DTensor] Test pointwise partial propagation#174000

Closed
wconstab wants to merge 3 commits intogh/wconstab/519/basefrom
gh/wconstab/519/head
Closed

[DTensor] Test pointwise partial propagation#174000
wconstab wants to merge 3 commits intogh/wconstab/519/basefrom
gh/wconstab/519/head

Conversation

@wconstab
Copy link
Copy Markdown
Contributor

@wconstab wconstab commented Jan 31, 2026

Stack from ghstack (oldest at bottom):

Add a bunch of missing test cases.

These were all 'missing' in my single-sim poitnwise PR (discovered
missing using the sharding validator). The pointwise rule on main
(non-single-dim) is not missing these, but we lack the test coverage.
Landing this first helps ensure we don't regress when refactoring.

Add a bunch of missing test cases.

These were all 'missing' in my single-sim poitnwise PR (discovered
missing using the sharding validator).  The pointwise rule on main
(non-single-dim) is not missing these, but we lack the test coverage.
Landing this first helps ensure we don't regress when refactoring.

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Jan 31, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Jan 31, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174000

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 1 Unrelated Failure

As of commit feb081b with merge base 4b0f7fb (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

if dist.is_initialized():
dist.destroy_process_group()

def test_add_partial_max_with_replicate(self):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldnt almost all these tests be parameterized?

Mathematically: max(a0, a1) + r = max(a0 + r, a1 + r)
Adding a replicated constant to P(max) preserves the max structure.
"""
with LocalTensorMode(frozenset(range(self.world_size))):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this could be a function decorator or be part of setup/teardown

Add a bunch of missing test cases.

These were all 'missing' in my single-sim poitnwise PR (discovered
missing using the sharding validator).  The pointwise rule on main
(non-single-dim) is not missing these, but we lack the test coverage.
Landing this first helps ensure we don't regress when refactoring.

[ghstack-poisoned]
@wconstab
Copy link
Copy Markdown
Contributor Author

wconstab commented Feb 2, 2026

@pytorchbot merge

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 2, 2026
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

Add a bunch of missing test cases.

These were all 'missing' in my single-sim poitnwise PR (discovered
missing using the sharding validator).  The pointwise rule on main
(non-single-dim) is not missing these, but we lack the test coverage.
Landing this first helps ensure we don't regress when refactoring.

[ghstack-poisoned]
@wconstab
Copy link
Copy Markdown
Contributor Author

wconstab commented Feb 3, 2026

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@jeffdaily
Copy link
Copy Markdown
Collaborator

failures on cuda and rocm

test/distributed/tensor/test_pointwise_ops.py::PointwisePartialsTest::test_sub_replicate_partial_min_gives_partial_max GH job link HUD commit link

@jeffdaily
Copy link
Copy Markdown
Collaborator

@pytorchbot revert -c nosignal -m "test/distributed/tensor/test_pointwise_ops.py::PointwisePartialsTest::test_sub_replicate_partial_min_gives_partial_max GH job link HUD commit link"

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@wconstab your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Feb 3, 2026
This reverts commit e2d81fb.

Reverted #174000 on behalf of https://github.com/jeffdaily due to test/distributed/tensor/test_pointwise_ops.py::PointwisePartialsTest::test_sub_replicate_partial_min_gives_partial_max [GH job link](https://github.com/pytorch/pytorch/actions/runs/21617187245/job/62324663867) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/e2d81fb2c5d8119b55e8e812da9229320928c16d) ([comment](#174000 (comment)))
@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Feb 3, 2026
@izaitsevfb
Copy link
Copy Markdown
Contributor

hud link for failures

@malfet
Copy link
Copy Markdown
Contributor

malfet commented Feb 3, 2026

Thanks for revert @jeffdaily !
@izaitsevfb do you know why auto-revert bot have not picked it up?

@izaitsevfb
Copy link
Copy Markdown
Contributor

Thanks for revert @jeffdaily ! @izaitsevfb do you know why auto-revert bot have not picked it up?

@malfet, yes, this is a case when a new broken test is added (test_pointwise_ops in this case), and currently autorevert doesn't support this situation (it doesn't have a signal from this specific test from the base revision). we have plans to cover that on the job level signals, but didn't have a chance to do it yet

@wconstab
Copy link
Copy Markdown
Contributor Author

wconstab commented Feb 6, 2026

closing as @anshul-si will take over

@wconstab wconstab closed this Feb 6, 2026
radeksm pushed a commit to radeksm/pytorch that referenced this pull request Feb 20, 2026
Add a bunch of missing test cases.

These were all 'missing' in my single-sim poitnwise PR (discovered
missing using the sharding validator).  The pointwise rule on main
(non-single-dim) is not missing these, but we lack the test coverage.
Landing this first helps ensure we don't regress when refactoring.
Pull Request resolved: pytorch#174000
Approved by: https://github.com/Skylion007
radeksm pushed a commit to radeksm/pytorch that referenced this pull request Feb 20, 2026
This reverts commit e2d81fb.

Reverted pytorch#174000 on behalf of https://github.com/jeffdaily due to test/distributed/tensor/test_pointwise_ops.py::PointwisePartialsTest::test_sub_replicate_partial_min_gives_partial_max [GH job link](https://github.com/pytorch/pytorch/actions/runs/21617187245/job/62324663867) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/e2d81fb2c5d8119b55e8e812da9229320928c16d) ([comment](pytorch#174000 (comment)))
libohao1201 pushed a commit to libohao1201/pytorch that referenced this pull request Mar 2, 2026
Add a bunch of missing test cases.

These were all 'missing' in my single-sim poitnwise PR (discovered
missing using the sharding validator).  The pointwise rule on main
(non-single-dim) is not missing these, but we lack the test coverage.
Landing this first helps ensure we don't regress when refactoring.
Pull Request resolved: pytorch#174000
Approved by: https://github.com/Skylion007
libohao1201 pushed a commit to libohao1201/pytorch that referenced this pull request Mar 2, 2026
This reverts commit e2d81fb.

Reverted pytorch#174000 on behalf of https://github.com/jeffdaily due to test/distributed/tensor/test_pointwise_ops.py::PointwisePartialsTest::test_sub_replicate_partial_min_gives_partial_max [GH job link](https://github.com/pytorch/pytorch/actions/runs/21617187245/job/62324663867) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/e2d81fb2c5d8119b55e8e812da9229320928c16d) ([comment](pytorch#174000 (comment)))
@github-actions github-actions Bot deleted the gh/wconstab/519/head branch March 9, 2026 02:22
sandy-gags pushed a commit to sandy-gags/pytorch that referenced this pull request Mar 12, 2026
Add a bunch of missing test cases.

These were all 'missing' in my single-sim poitnwise PR (discovered
missing using the sharding validator).  The pointwise rule on main
(non-single-dim) is not missing these, but we lack the test coverage.
Landing this first helps ensure we don't regress when refactoring.

ghstack-source-id: 51dc560
Pull Request resolved: pytorch/pytorch#174000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/trunk Trigger trunk jobs on your pull request Merged Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants