[DTensor] make debugmode print optimized transforminfos by wconstab · Pull Request #173436 · pytorch/pytorch

wconstab · 2026-01-26T22:57:58Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

pytorch-bot · 2026-01-26T22:58:02Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/173436

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 6 Unrelated Failures

As of commit 1100b4f with merge base 7754b55 ():

NEW FAILURE - The following job has failed:

trunk / linux-jammy-rocm-py3.10 / test (default, 4, 6, linux.rocm.gpu.gfx942.1) (gh)
'Test'

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor / unit-test / inductor-halide-test / test (inductor-halide, 1, 1, linux.12xlarge) (gh) (disabled by #150624 but the issue was closed recently and a rebase is needed to make it pass)
test/inductor/test_halide.py::HalideCpuTests::test_special_polygamma_cpu_halide
inductor / unit-test / inductor-test / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (disabled by #139828 but the issue was closed recently and a rebase is needed to make it pass)
test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pca_lowrank_cuda_float32
trunk / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 3, 5, linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (disabled by #118346 but the issue was closed recently and a rebase is needed to make it pass)
test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_mm_triton_kernel_benchmark
trunk / linux-jammy-cuda13.0-py3.10-gcc11 / test (default, 3, 5, linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (disabled by #118346 but the issue was closed recently and a rebase is needed to make it pass)
test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_mm_triton_kernel_benchmark
trunk / linux-jammy-cuda13.0-py3.10-gcc11 / test (default, 5, 5, linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (disabled by #145359 but the issue was closed recently and a rebase is needed to make it pass)
test/inductor/test_inplace_padding.py::InplacePaddingTest::test_linear_and_cel_max_autotune
trunk / linux-jammy-rocm-py3.10 / test (default, 1, 6, linux.rocm.gpu.gfx942.1) (gh) (similar failure)
test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_2d_out_of_bounds_class_index_cuda_float16

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 5fd1002 Pull Request resolved: #173436

zpcore · 2026-01-26T23:08:45Z

                )
-                shard_order_dict[src_dim].pop()
+                # Remove mesh dims in order (from shard_order_dict perspective)
+                for _ in mesh_dims_to_update:


Can we assert check for x = shard_order_dict[src_dim].pop(), we must have x in mesh_dims_to_update? Same to dst_dim_placement. Just to be safe that the optimized transforminfo is correct.

zpcore · 2026-01-26T23:12:45Z

-            cur_placement[transform_info.mesh_dim] = dst_dim_placement
+                # Add mesh dims in order
+                for mesh_dim in mesh_dims_to_update:
+                    shard_order_dict[dst_dim].append(mesh_dim)


The order here is related to #172610 (comment). Let's address that one first.

zpcore · 2026-01-26T23:23:12Z

Nit: Suggest using something like '-->' (double dash) to show this is an optimized transforms.

[ghstack-poisoned]

ghstack-source-id: 3f4a9fd Pull Request resolved: #173436

[ghstack-poisoned]

ghstack-source-id: d792492 Pull Request resolved: #173436

wconstab · 2026-01-27T03:37:54Z

                self.assertExpectedInline(
                    trace_str,
-                    """S(0)[0]S(0)[1]_S(0, 3)->S(0)[0]S(0)[1]R->S(0)[0]RR->RRR->RS(0)[1]R->RS(0)[1]S(0)[2]""",  # noqa: B950
+                    """S(0)[0]S(0)[1]_S(0, 3)->S(0)[0]S(0)[1]R->S(0)RR->RRR->RS(0)R->RS(0)[0]S(0)[1]""",  # noqa: B950


@zpcore do you buy this explanation?

● I see the issue. The old code had a bug where it didn't handle _StridedShard in the pop logic (since _StridedShard.is_shard() returns
False). This caused it to produce an incorrect shard_order, and the test's expected traces were written to match that incorrect
behavior.

Now that I've fixed the code to properly handle _StridedShard, the correct output differs from the expected. I need to update the
test's expected traces to match the correct behavior.

Let me run the test with EXPECTTEST_ACCEPT=1 to see what all the correct traces should be.

I see the issue that I didn't add the string of order for StridedShard in the output...

The fix from Claude still have some missing order for _StridedShard.

The fix from Claude still have some missing order for _StridedShard.

can you say more about this?

oh- just that we don't have the [i] after _S in the string repr. fixing that.

[ghstack-poisoned]

ghstack-source-id: 84d1a92 Pull Request resolved: #173436

[ghstack-poisoned]

ghstack-source-id: 5bf4130 Pull Request resolved: #173436

zpcore

I think we can fix the _StridedShard order later, since _StridedShard is not that important now.

[ghstack-poisoned]

ghstack-source-id: 8179eac Pull Request resolved: #173436

[ghstack-poisoned]

ghstack-source-id: 142eec6 Pull Request resolved: #173436

[ghstack-poisoned]

ghstack-source-id: 5b87217 Pull Request resolved: #173436

[ghstack-poisoned]

ghstack-source-id: f265a09 Pull Request resolved: #173436

wconstab · 2026-01-28T18:21:36Z

@pytorchbot merge

pytorchmergebot · 2026-01-28T18:24:16Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2026-01-28T18:50:49Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2022-cpu-py3 / build

Details for Dev Infra team

Raised by workflow job

wconstab · 2026-01-28T18:55:26Z

@pytorchbot merge -i

pytorchmergebot · 2026-01-28T18:57:42Z

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / win-vs2022-cpu-py3 / build

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Pull Request resolved: pytorch#173436 Approved by: https://github.com/zpcore ghstack dependencies: pytorch#173593, pytorch#172610

facebook-github-bot · 2026-02-03T22:34:52Z

@pytorchbot revert -m="Diff reverted internally" -c="ghfirst"

This Pull Request has been reverted by a revert inside Meta. To re-land this change, please open another pull request, assign the same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).)

pytorchmergebot · 2026-02-03T22:36:29Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

)" This reverts commit 47260be. Reverted #173436 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](#173436 (comment)))

pytorchmergebot · 2026-02-03T22:36:38Z

@wconstab your PR has been successfully reverted.

wconstab · 2026-02-09T23:05:33Z

relanding via #174630

@bdhirsh

Reland of #172610: same code as previous land except: - includes #173873 (credit @bdhirsh) - includes #173790 (credit @IvanKobzarev) - includes #173436 - adds disable contextmanager + test Ensures that when possible (when such a flattened mesh exists), DTensor will find and use it to avoid more costly sequential comms, and particularly for reduce comms, also avoids the risk of different reduction orders causing divergent results. (See [this doc](https://docs.google.com/document/d/1hJsnodQmHfs1QosNgR39HZNiOOzfnZ6bnALqonDpcDs/edit?userstoinvite=rrathaur@redhat.com&sharingaction=manageaccess&role=reader&tab=t.0) for more info. Example: For a (2,2,2) mesh with dims (A,B,C) and placements when redistributing from (Psum, Replicate, Psum) -> (Replicate, Replicate, Replicate) - the original behavior would be 2 separate all_reduces. After this PR, if the user flattens dims A,C, this becomes one larger all_reduce. Compared with earlier attempt #172119, this PR - includes optimization for comms other than all_reduce - explicitly bans mixed partial types (Psum, Pmax) is not a valid placement, so we don't have to worry about optimizing around it - therefore uses a simpler implementation involving grouping adjacent transforminfos and then merging like kinds - Warns once per mesh shape for missing flattened meshes - Won't optimize reduce_scatters when they shard an uneven sized tensor dim Details/Limitations - all_to_all is never merged (left for possible future work, but not obvious how to do it in general) - reduce_scatter is only merged when the outermost partial shape is evenly divisible by the flattened mesh - otherwise, warns - reduce_scatter and all_gather are only merged when the shards are in left-to-right (ascending) order, since DeviceMesh only supports flattening in ascending order and the mesh ordering impacts correctness. - groups of like-kind collectives are NOT combined if they are not adjacent in the transform_info list - flattened device-meshes are not automatically created due to preference of explicit creation and ensuring torch.compile works, but warnings prompt the user to create them when it would help allow an optimization - DOES support merging mixed Partial (sum, avg) reductions, using the product of the avg dim sizes to scale after performing a sum reduction on the merged mesh. Refuses to merge any other combinations of mixed partials. Fixes #171916 Note: initial attempt used stable sort with a __lt__ method in TransformInfo comparing comm type key, but this was not correct because sorting a local (no-comm) operation like chunking before or after a comm operation on the same mesh time affects results. Differential Revision: D92540256 Pull Request resolved: #174630 Approved by: https://github.com/zpcore

…rch#173436)" This reverts commit 47260be. Reverted pytorch#173436 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](pytorch#173436 (comment)))

WIP fix debugmode issue

c6f2acc

[ghstack-poisoned]

wconstab mentioned this pull request Jan 26, 2026

[DTensor] Optimize redistribute comms using flattened meshes #172610

Closed

pytorch-bot Bot added ciflow/inductor release notes: distributed (dtensor) release notes category labels Jan 26, 2026

wconstab added a commit that referenced this pull request Jan 26, 2026

WIP fix debugmode issue

5061309

ghstack-source-id: 5fd1002 Pull Request resolved: #173436

zpcore reviewed Jan 26, 2026

View reviewed changes

Update on "WIP fix debugmode issue"

ee97484

[ghstack-poisoned]

wconstab added a commit that referenced this pull request Jan 27, 2026

WIP fix debugmode issue

5e43eb7

ghstack-source-id: 3f4a9fd Pull Request resolved: #173436

Update on "WIP fix debugmode issue"

9b682cd

[ghstack-poisoned]

wconstab added a commit that referenced this pull request Jan 27, 2026

WIP fix debugmode issue

430291e

ghstack-source-id: d792492 Pull Request resolved: #173436

wconstab commented Jan 27, 2026

View reviewed changes

Update on "WIP fix debugmode issue"

0eda5f8

[ghstack-poisoned]

wconstab added a commit that referenced this pull request Jan 27, 2026

WIP fix debugmode issue

bb3422f

ghstack-source-id: 84d1a92 Pull Request resolved: #173436

wconstab changed the title ~~WIP fix debugmode issue~~ [DTensor] make debugmode print optimized transforminfos Jan 27, 2026

Update on "[DTensor] make debugmode print optimized transforminfos"

6c594b2

[ghstack-poisoned]

wconstab added a commit that referenced this pull request Jan 27, 2026

WIP fix debugmode issue

d64c905

ghstack-source-id: 5bf4130 Pull Request resolved: #173436

zpcore approved these changes Jan 27, 2026

View reviewed changes

Update on "[DTensor] make debugmode print optimized transforminfos"

7cc72de

[ghstack-poisoned]

wconstab added a commit that referenced this pull request Jan 27, 2026

WIP fix debugmode issue

cdc4d16

ghstack-source-id: 8179eac Pull Request resolved: #173436

Update on "[DTensor] make debugmode print optimized transforminfos"

b663a86

[ghstack-poisoned]

wconstab mentioned this pull request Jan 28, 2026

[DTensor] update add_backward benchmark to avoid redisribute #173593

Closed

wconstab added a commit that referenced this pull request Jan 28, 2026

WIP fix debugmode issue

850472b

ghstack-source-id: 142eec6 Pull Request resolved: #173436

Update on "[DTensor] make debugmode print optimized transforminfos"

f310739

[ghstack-poisoned]

wconstab added a commit that referenced this pull request Jan 28, 2026

WIP fix debugmode issue

fbab93d

ghstack-source-id: 5b87217 Pull Request resolved: #173436

Update on "[DTensor] make debugmode print optimized transforminfos"

1100b4f

[ghstack-poisoned]

wconstab added a commit that referenced this pull request Jan 28, 2026

WIP fix debugmode issue

d0d970f

ghstack-source-id: f265a09 Pull Request resolved: #173436

pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 28, 2026

pytorchmergebot added the merging label Jan 28, 2026

pytorchmergebot removed the merging label Jan 28, 2026

pytorchmergebot added the merging label Jan 28, 2026

pytorchmergebot added the Merged label Jan 28, 2026

pytorchmergebot closed this in 47260be Jan 28, 2026

pytorchmergebot removed the merging label Jan 28, 2026

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Feb 3, 2026

pytorchmergebot reopened this Feb 3, 2026

wconstab closed this Feb 9, 2026

wconstab mentioned this pull request Feb 9, 2026

[DTensor] Optimize redistribute comms using flattened meshes #174630

Closed

github-actions Bot deleted the gh/wconstab/506/head branch March 12, 2026 02:22

Conversation

wconstab commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/173436

❌ 1 New Failure, 6 Unrelated Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zpcore Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zpcore left a comment

Choose a reason for hiding this comment

Uh oh!

wconstab commented Jan 28, 2026

Uh oh!

pytorchmergebot commented Jan 28, 2026

Merge started

Uh oh!

pytorchmergebot commented Jan 28, 2026

Merge failed

Uh oh!

wconstab commented Jan 28, 2026

Uh oh!

pytorchmergebot commented Jan 28, 2026

Merge started

Uh oh!

facebook-github-bot commented Feb 3, 2026

Uh oh!

pytorchmergebot commented Feb 3, 2026

Uh oh!

pytorchmergebot commented Feb 3, 2026

Uh oh!

wconstab commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wconstab commented Jan 26, 2026 •

edited

Loading

pytorch-bot Bot commented Jan 26, 2026 •

edited

Loading

zpcore Jan 27, 2026 •

edited

Loading