[DTensor] update add_backward benchmark to avoid redisribute by wconstab · Pull Request #173593 · pytorch/pytorch

wconstab · 2026-01-28T00:08:17Z

Stack from ghstack (oldest at bottom):

Benchmark focuses on dispatch time.

Also, the shapes didn't make sense for add previously, fixed them

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela @jataylo

Benchmark focuses on dispatch time. Also, the shapes didn't make sense for add previously, fixed them [ghstack-poisoned]

pytorch-bot · 2026-01-28T00:08:22Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

pytorch-bot · 2026-01-28T00:08:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/173593

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 3594a66 with merge base 7754b55 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor / unit-test / inductor-test / test (inductor, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (disabled by #137684)
test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_cuda_float32

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Benchmark focuses on dispatch time. Also, the shapes didn't make sense for add previously, fixed them cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]

pytorchmergebot · 2026-01-28T18:24:19Z

Starting merge as part of PR stack under #173436

pytorchmergebot · 2026-01-28T18:57:44Z

Starting merge as part of PR stack under #173436

Ensures that when possible (when such a flattened mesh exists), DTensor will find and use it to avoid more costly sequential comms, and particularly for reduce comms, also avoids the risk of different reduction orders causing divergent results. (See [this doc](https://docs.google.com/document/d/1hJsnodQmHfs1QosNgR39HZNiOOzfnZ6bnALqonDpcDs/edit?userstoinvite=rrathaur@redhat.com&sharingaction=manageaccess&role=reader&tab=t.0) for more info. Example: For a (2,2,2) mesh with dims (A,B,C) and placements when redistributing from (Psum, Replicate, Psum) -> (Replicate, Replicate, Replicate) - the original behavior would be 2 separate all_reduces. After this PR, if the user flattens dims A,C, this becomes one larger all_reduce. Compared with earlier attempt #172119, this PR - includes optimization for comms other than all_reduce - explicitly bans mixed partial types (Psum, Pmax) is not a valid placement, so we don't have to worry about optimizing around it - therefore uses a simpler implementation involving grouping adjacent transforminfos and then merging like kinds - Warns once per mesh shape for missing flattened meshes - Won't optimize reduce_scatters when they shard an uneven sized tensor dim Details/Limitations - all_to_all is never merged (left for possible future work, but not obvious how to do it in general) - reduce_scatter is only merged when the outermost partial shape is evenly divisible by the flattened mesh - otherwise, warns - reduce_scatter and all_gather are only merged when the shards are in left-to-right (ascending) order, since DeviceMesh only supports flattening in ascending order and the mesh ordering impacts correctness. - groups of like-kind collectives are NOT combined if they are not adjacent in the transform_info list - flattened device-meshes are not automatically created due to preference of explicit creation and ensuring torch.compile works, but warnings prompt the user to create them when it would help allow an optimization - DOES support merging mixed Partial (sum, avg) reductions, using the product of the avg dim sizes to scale after performing a sum reduction on the merged mesh. Refuses to merge any other combinations of mixed partials. Fixes #171916 Note: initial attempt used stable sort with a __lt__ method in TransformInfo comparing comm type key, but this was not correct because sorting a local (no-comm) operation like chunking before or after a comm operation on the same mesh time affects results. Pull Request resolved: #172610 Approved by: https://github.com/tianyu-l, https://github.com/zpcore ghstack dependencies: #173593

Pull Request resolved: #173436 Approved by: https://github.com/zpcore ghstack dependencies: #173593, #172610

…#173593) Benchmark focuses on dispatch time. Also, the shapes didn't make sense for add previously, fixed them Pull Request resolved: pytorch#173593 Approved by: https://github.com/pianpwk

…#172610) Ensures that when possible (when such a flattened mesh exists), DTensor will find and use it to avoid more costly sequential comms, and particularly for reduce comms, also avoids the risk of different reduction orders causing divergent results. (See [this doc](https://docs.google.com/document/d/1hJsnodQmHfs1QosNgR39HZNiOOzfnZ6bnALqonDpcDs/edit?userstoinvite=rrathaur@redhat.com&sharingaction=manageaccess&role=reader&tab=t.0) for more info. Example: For a (2,2,2) mesh with dims (A,B,C) and placements when redistributing from (Psum, Replicate, Psum) -> (Replicate, Replicate, Replicate) - the original behavior would be 2 separate all_reduces. After this PR, if the user flattens dims A,C, this becomes one larger all_reduce. Compared with earlier attempt pytorch#172119, this PR - includes optimization for comms other than all_reduce - explicitly bans mixed partial types (Psum, Pmax) is not a valid placement, so we don't have to worry about optimizing around it - therefore uses a simpler implementation involving grouping adjacent transforminfos and then merging like kinds - Warns once per mesh shape for missing flattened meshes - Won't optimize reduce_scatters when they shard an uneven sized tensor dim Details/Limitations - all_to_all is never merged (left for possible future work, but not obvious how to do it in general) - reduce_scatter is only merged when the outermost partial shape is evenly divisible by the flattened mesh - otherwise, warns - reduce_scatter and all_gather are only merged when the shards are in left-to-right (ascending) order, since DeviceMesh only supports flattening in ascending order and the mesh ordering impacts correctness. - groups of like-kind collectives are NOT combined if they are not adjacent in the transform_info list - flattened device-meshes are not automatically created due to preference of explicit creation and ensuring torch.compile works, but warnings prompt the user to create them when it would help allow an optimization - DOES support merging mixed Partial (sum, avg) reductions, using the product of the avg dim sizes to scale after performing a sum reduction on the merged mesh. Refuses to merge any other combinations of mixed partials. Fixes pytorch#171916 Note: initial attempt used stable sort with a __lt__ method in TransformInfo comparing comm type key, but this was not correct because sorting a local (no-comm) operation like chunking before or after a comm operation on the same mesh time affects results. Pull Request resolved: pytorch#172610 Approved by: https://github.com/tianyu-l, https://github.com/zpcore ghstack dependencies: pytorch#173593

Pull Request resolved: pytorch#173436 Approved by: https://github.com/zpcore ghstack dependencies: pytorch#173593, pytorch#172610

[DTensor] update add_backward benchmark to avoid redisribute

2e516f6

Benchmark focuses on dispatch time. Also, the shapes didn't make sense for add previously, fixed them [ghstack-poisoned]

wconstab mentioned this pull request Jan 27, 2026

[DTensor] Optimize redistribute comms using flattened meshes #172610

Closed

pytorch-bot Bot added the ciflow/inductor label Jan 28, 2026

wconstab mentioned this pull request Jan 27, 2026

[DTensor] make debugmode print optimized transforminfos #173436

Closed

pytorch-bot Bot added module: dynamo topic: not user facing topic category labels Jan 28, 2026

wconstab requested a review from pianpwk January 28, 2026 00:41

pianpwk approved these changes Jan 28, 2026

View reviewed changes

pytorchmergebot closed this in 24baa5c Jan 28, 2026

pytorchmergebot added the Merged label Jan 28, 2026

pytorchmergebot pushed a commit that referenced this pull request Jan 28, 2026

[DTensor] make debugmode print optimized transforminfos (#173436)

47260be

Pull Request resolved: #173436 Approved by: https://github.com/zpcore ghstack dependencies: #173593, #172610

github-actions Bot deleted the gh/wconstab/509/head branch February 28, 2026 02:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DTensor] update add_backward benchmark to avoid redisribute#173593

[DTensor] update add_backward benchmark to avoid redisribute#173593
wconstab wants to merge 2 commits intogh/wconstab/509/basefrom
gh/wconstab/509/head

wconstab commented Jan 28, 2026 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented Jan 28, 2026

Uh oh!

pytorch-bot Bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

pytorchmergebot commented Jan 28, 2026

Uh oh!

pytorchmergebot commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wconstab commented Jan 28, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jan 28, 2026

This PR needs a release notes: label

Uh oh!

pytorch-bot Bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/173593

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

pytorchmergebot commented Jan 28, 2026

Uh oh!

pytorchmergebot commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wconstab commented Jan 28, 2026 •

edited by pytorch-bot Bot

Loading

This PR needs a `release notes:` label

pytorch-bot Bot commented Jan 28, 2026 •

edited

Loading