[dtensor] fix flatten mesh dims arg relative to submesh by IvanKobzarev · Pull Request #173790 · pytorch/pytorch

IvanKobzarev · 2026-01-29T13:55:19Z

Stack from ghstack (oldest at bottom):

-> [dtensor] fix flatten mesh dims arg relative to submesh #173790

Treat mesh_dims arg in _get_flattened_mesh_by_layout relative to submesh

Test:

python test/distributed/tensor/test_redistribute.py -k test_get_flattened_mesh_by_layout_with_submesh

[ghstack-poisoned]

pytorch-bot · 2026-01-29T13:55:23Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

pytorch-bot · 2026-01-29T13:55:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/173790

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (5 Unrelated Failures)

As of commit fe9aa15 with merge base 19449aa ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor / unit-test / inductor-halide-test / test (inductor-halide, 1, 1, linux.12xlarge) (gh) (disabled by #150624 but the issue was closed recently and a rebase is needed to make it pass)
test/inductor/test_halide.py::HalideCpuTests::test_special_polygamma_cpu_halide
inductor / unit-test / inductor-test / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (disabled by #139828 but the issue was closed recently and a rebase is needed to make it pass)
test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pca_lowrank_cuda_float32

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-docs / build-docs-python-false (gh) (trunk failure)
Process completed with exit code 1.
trunk / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 1, 5, linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (trunk failure)
test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_wrong_pytree
trunk / linux-jammy-cuda13.0-py3.10-gcc11 / test (default, 4, 5, linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (trunk failure)
test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_wrong_pytree

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 5abc08e Pull-Request: #173790

wconstab

lgtm- the names indeed should come from the submesh. can you update the test case as @fegin mentioned, to

root_mesh = init_device_mesh((8,), ("world"))
spmd_mesh = root_mesh.unflatten((2, 2, 2), ("pp", "dp", "ep"))["dp", "ep"]

IvanKobzarev · 2026-01-29T20:15:16Z

@pytorchbot merge

pytorchmergebot · 2026-01-29T20:17:13Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2026-01-29T20:17:31Z

Merge failed

Reason: 13 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

[ghstack-poisoned]

ghstack-source-id: a435071 Pull-Request: #173790

fegin

Thanks for the fix.

[ghstack-poisoned]

ghstack-source-id: 54a594d Pull-Request: #173790

Treat mesh_dims arg in _get_flattened_mesh_by_layout relative to submesh Test: ``` python test/distributed/tensor/test_redistribute.py -k test_get_flattened_mesh_by_layout_with_submesh ``` [ghstack-poisoned]

ghstack-source-id: c6b814b Pull-Request: #173790

wconstab · 2026-01-30T00:34:27Z

I just did ghstack checkout, spin fixlint, commit, ghstack

will attempt to land asap

wconstab · 2026-01-30T01:31:19Z

~~landing internally via D91843715~~ realized that internal diff needs to be exported and run oss CI too. trying to land oss first instead

wconstab · 2026-01-30T03:27:14Z

@pytorchbot merge -i

pytorchmergebot · 2026-01-30T03:29:00Z

Merge started

Your change will be merged while ignoring the following 3 checks: pull / linux-docs / build-docs-python-false, inductor / unit-test / inductor-test / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu), inductor / unit-test / inductor-halide-test / test (inductor-halide, 1, 1, linux.12xlarge)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Treat mesh_dims arg in _get_flattened_mesh_by_layout relative to submesh Test: ``` python test/distributed/tensor/test_redistribute.py -k test_get_flattened_mesh_by_layout_with_submesh ``` Pull Request resolved: pytorch#173790 Approved by: https://github.com/wconstab, https://github.com/fegin, https://github.com/jathu Co-authored-by: Will Constable <whc@meta.com>

facebook-github-bot · 2026-02-03T22:18:08Z

@pytorchbot revert -m="Diff reverted internally" -c="ghfirst"

This Pull Request has been reverted by a revert inside Meta. To re-land this change, please open another pull request, assign the same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).)

pytorchmergebot · 2026-02-03T22:21:20Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

)" This reverts commit c7d863a. Reverted #173790 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](#173790 (comment)))

pytorchmergebot · 2026-02-03T22:21:28Z

@IvanKobzarev your PR has been successfully reverted.

izaitsevfb · 2026-02-03T22:35:55Z

sorry for the churn, please feel free to rebase and reland

wconstab · 2026-02-04T14:26:47Z

I'll take care of this

Summary: Reland of #172610 - includes fixes #173873 (credit bdhirsh) and #173790 (credit IvanKobzarev) Ensures that when possible (when such a flattened mesh exists), DTensor will find and use it to avoid more costly sequential comms, and particularly for reduce comms, also avoids the risk of different reduction orders causing divergent results. (See [this doc](https://docs.google.com/document/d/1hJsnodQmHfs1QosNgR39HZNiOOzfnZ6bnALqonDpcDs/edit?userstoinvite=rrathaur@redhat.com&sharingaction=manageaccess&role=reader&tab=t.0) for more info. Example: For a (2,2,2) mesh with dims (A,B,C) and placements when redistributing from (Psum, Replicate, Psum) -> (Replicate, Replicate, Replicate) - the original behavior would be 2 separate all_reduces. After this PR, if the user flattens dims A,C, this becomes one larger all_reduce. Compared with earlier attempt #172119, this PR - includes optimization for comms other than all_reduce - explicitly bans mixed partial types (Psum, Pmax) is not a valid placement, so we don't have to worry about optimizing around it - therefore uses a simpler implementation involving grouping adjacent transforminfos and then merging like kinds - Warns once per mesh shape for missing flattened meshes - Won't optimize reduce_scatters when they shard an uneven sized tensor dim Details/Limitations - all_to_all is never merged (left for possible future work, but not obvious how to do it in general) - reduce_scatter is only merged when the outermost partial shape is evenly divisible by the flattened mesh - otherwise, warns - reduce_scatter and all_gather are only merged when the shards are in left-to-right (ascending) order, since DeviceMesh only supports flattening in ascending order and the mesh ordering impacts correctness. - groups of like-kind collectives are NOT combined if they are not adjacent in the transform_info list - flattened device-meshes are not automatically created due to preference of explicit creation and ensuring torch.compile works, but warnings prompt the user to create them when it would help allow an optimization - DOES support merging mixed Partial (sum, avg) reductions, using the product of the avg dim sizes to scale after performing a sum reduction on the merged mesh. Refuses to merge any other combinations of mixed partials. Fixes #171916 Note: initial attempt used stable sort with a __lt__ method in TransformInfo comparing comm type key, but this was not correct because sorting a local (no-comm) operation like chunking before or after a comm operation on the same mesh time affects results. Differential Revision: D92540256

wconstab · 2026-02-09T23:14:07Z

squashed into #174630

Summary: Reland of #172610 - includes fixes #173873 (credit bdhirsh) and #173790 (credit IvanKobzarev) Ensures that when possible (when such a flattened mesh exists), DTensor will find and use it to avoid more costly sequential comms, and particularly for reduce comms, also avoids the risk of different reduction orders causing divergent results. (See [this doc](https://docs.google.com/document/d/1hJsnodQmHfs1QosNgR39HZNiOOzfnZ6bnALqonDpcDs/edit?userstoinvite=rrathaur@redhat.com&sharingaction=manageaccess&role=reader&tab=t.0) for more info. Example: For a (2,2,2) mesh with dims (A,B,C) and placements when redistributing from (Psum, Replicate, Psum) -> (Replicate, Replicate, Replicate) - the original behavior would be 2 separate all_reduces. After this PR, if the user flattens dims A,C, this becomes one larger all_reduce. Compared with earlier attempt #172119, this PR - includes optimization for comms other than all_reduce - explicitly bans mixed partial types (Psum, Pmax) is not a valid placement, so we don't have to worry about optimizing around it - therefore uses a simpler implementation involving grouping adjacent transforminfos and then merging like kinds - Warns once per mesh shape for missing flattened meshes - Won't optimize reduce_scatters when they shard an uneven sized tensor dim Details/Limitations - all_to_all is never merged (left for possible future work, but not obvious how to do it in general) - reduce_scatter is only merged when the outermost partial shape is evenly divisible by the flattened mesh - otherwise, warns - reduce_scatter and all_gather are only merged when the shards are in left-to-right (ascending) order, since DeviceMesh only supports flattening in ascending order and the mesh ordering impacts correctness. - groups of like-kind collectives are NOT combined if they are not adjacent in the transform_info list - flattened device-meshes are not automatically created due to preference of explicit creation and ensuring torch.compile works, but warnings prompt the user to create them when it would help allow an optimization - DOES support merging mixed Partial (sum, avg) reductions, using the product of the avg dim sizes to scale after performing a sum reduction on the merged mesh. Refuses to merge any other combinations of mixed partials. Fixes #171916 Note: initial attempt used stable sort with a __lt__ method in TransformInfo comparing comm type key, but this was not correct because sorting a local (no-comm) operation like chunking before or after a comm operation on the same mesh time affects results. Differential Revision: D92540256

@bdhirsh

Reland of #172610: same code as previous land except: - includes #173873 (credit @bdhirsh) - includes #173790 (credit @IvanKobzarev) - includes #173436 - adds disable contextmanager + test Ensures that when possible (when such a flattened mesh exists), DTensor will find and use it to avoid more costly sequential comms, and particularly for reduce comms, also avoids the risk of different reduction orders causing divergent results. (See [this doc](https://docs.google.com/document/d/1hJsnodQmHfs1QosNgR39HZNiOOzfnZ6bnALqonDpcDs/edit?userstoinvite=rrathaur@redhat.com&sharingaction=manageaccess&role=reader&tab=t.0) for more info. Example: For a (2,2,2) mesh with dims (A,B,C) and placements when redistributing from (Psum, Replicate, Psum) -> (Replicate, Replicate, Replicate) - the original behavior would be 2 separate all_reduces. After this PR, if the user flattens dims A,C, this becomes one larger all_reduce. Compared with earlier attempt #172119, this PR - includes optimization for comms other than all_reduce - explicitly bans mixed partial types (Psum, Pmax) is not a valid placement, so we don't have to worry about optimizing around it - therefore uses a simpler implementation involving grouping adjacent transforminfos and then merging like kinds - Warns once per mesh shape for missing flattened meshes - Won't optimize reduce_scatters when they shard an uneven sized tensor dim Details/Limitations - all_to_all is never merged (left for possible future work, but not obvious how to do it in general) - reduce_scatter is only merged when the outermost partial shape is evenly divisible by the flattened mesh - otherwise, warns - reduce_scatter and all_gather are only merged when the shards are in left-to-right (ascending) order, since DeviceMesh only supports flattening in ascending order and the mesh ordering impacts correctness. - groups of like-kind collectives are NOT combined if they are not adjacent in the transform_info list - flattened device-meshes are not automatically created due to preference of explicit creation and ensuring torch.compile works, but warnings prompt the user to create them when it would help allow an optimization - DOES support merging mixed Partial (sum, avg) reductions, using the product of the avg dim sizes to scale after performing a sum reduction on the merged mesh. Refuses to merge any other combinations of mixed partials. Fixes #171916 Note: initial attempt used stable sort with a __lt__ method in TransformInfo comparing comm type key, but this was not correct because sorting a local (no-comm) operation like chunking before or after a comm operation on the same mesh time affects results. Differential Revision: D92540256 Pull Request resolved: #174630 Approved by: https://github.com/zpcore

…rch#173790)" This reverts commit c7d863a. Reverted pytorch#173790 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](pytorch#173790 (comment)))

Update

0b02428

[ghstack-poisoned]

pytorch-bot Bot added ciflow/inductor release notes: distributed (dtensor) release notes category labels Jan 29, 2026

IvanKobzarev added a commit that referenced this pull request Jan 29, 2026

[dtensor] fix flatten mesh dims arg relative to submesh

5713dc8

ghstack-source-id: 5abc08e Pull-Request: #173790

IvanKobzarev requested a review from wconstab January 29, 2026 13:55

IvanKobzarev added the topic: not user facing topic category label Jan 29, 2026

IvanKobzarev mentioned this pull request Jan 29, 2026

[ci] Add DSv3 simple_fsdp auto_bucketing to CI meta-pytorch/autoparallel#310

Closed

wconstab approved these changes Jan 29, 2026

View reviewed changes

pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 29, 2026

pytorchmergebot added the merging label Jan 29, 2026

pytorchmergebot removed the merging label Jan 29, 2026

Update

e35f451

[ghstack-poisoned]

IvanKobzarev added a commit that referenced this pull request Jan 29, 2026

[dtensor] fix flatten mesh dims arg relative to submesh

646eda4

ghstack-source-id: a435071 Pull-Request: #173790

fegin approved these changes Jan 29, 2026

View reviewed changes

Update

65bb463

[ghstack-poisoned]

IvanKobzarev added a commit that referenced this pull request Jan 29, 2026

[dtensor] fix flatten mesh dims arg relative to submesh

2b22281

ghstack-source-id: 54a594d Pull-Request: #173790

wconstab mentioned this pull request Jan 29, 2026

[DTensor] Optimize redistribute comms using flattened meshes #172610

Closed

aditvenk mentioned this pull request Jan 30, 2026

[BE][NFC] Add integration test for simplefsdp + CP deepseek_v3 pytorch/torchtitan#2301

Merged

Update on "[dtensor] fix flatten mesh dims arg relative to submesh"

fe9aa15

Treat mesh_dims arg in _get_flattened_mesh_by_layout relative to submesh Test: ``` python test/distributed/tensor/test_redistribute.py -k test_get_flattened_mesh_by_layout_with_submesh ``` [ghstack-poisoned]

wconstab added a commit that referenced this pull request Jan 30, 2026

[dtensor] fix flatten mesh dims arg relative to submesh

f214094

ghstack-source-id: c6b814b Pull-Request: #173790

jathu approved these changes Jan 30, 2026

View reviewed changes

wconstab mentioned this pull request Jan 30, 2026

fix redistribute() handling for finding flattened device mesh dims under compile #173873

Closed

pytorchmergebot added the merging label Jan 30, 2026

pytorchmergebot added the Merged label Jan 30, 2026

pytorchmergebot closed this in c7d863a Jan 30, 2026

pytorchmergebot removed the merging label Jan 30, 2026

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Feb 3, 2026

pytorchmergebot reopened this Feb 3, 2026

wconstab mentioned this pull request Feb 9, 2026

[DTensor] Optimize redistribute comms using flattened meshes #174630

Closed

wconstab closed this Feb 9, 2026

github-actions Bot deleted the gh/IvanKobzarev/210/head branch March 12, 2026 02:22

Conversation

IvanKobzarev commented Jan 29, 2026 • edited by wconstab Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jan 29, 2026

This PR needs a release notes: label

Uh oh!

pytorch-bot Bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/173790

✅ You can merge normally! (5 Unrelated Failures)

Uh oh!

wconstab left a comment

Choose a reason for hiding this comment

Uh oh!

IvanKobzarev commented Jan 29, 2026

Uh oh!

pytorchmergebot commented Jan 29, 2026

Merge started

Uh oh!

pytorchmergebot commented Jan 29, 2026

Merge failed

Uh oh!

fegin left a comment

Choose a reason for hiding this comment

Uh oh!

wconstab commented Jan 30, 2026

Uh oh!

wconstab commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wconstab commented Jan 30, 2026

Uh oh!

pytorchmergebot commented Jan 30, 2026

Merge started

Uh oh!

facebook-github-bot commented Feb 3, 2026

Uh oh!

pytorchmergebot commented Feb 3, 2026

Uh oh!

pytorchmergebot commented Feb 3, 2026

Uh oh!

izaitsevfb commented Feb 3, 2026

Uh oh!

wconstab commented Feb 4, 2026

Uh oh!

wconstab commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

IvanKobzarev commented Jan 29, 2026 •

edited by wconstab

Loading

This PR needs a `release notes:` label

pytorch-bot Bot commented Jan 29, 2026 •

edited

Loading

wconstab commented Jan 30, 2026 •

edited

Loading