[DTensor] Optimize redistribute to use flattened mesh dims for consecutive reductions by ezyang · Pull Request #171913 · pytorch/pytorch

ezyang · 2026-01-07T20:51:40Z

Stack from ghstack (oldest at bottom):

-> [DTensor] Optimize redistribute to use flattened mesh dims for consecutive reductions #171913

Authored with claude code

When there are multiple reductions that need to occur on multiple mesh
dims, we will issue multiple collectives per mesh dim. When we want to
do a reduction on multiple contiguous mesh dims, and a flattened dim of
those contiguous dims exists (e.g., we have already paid for
initializing PGs for the flattened dim), then it would be better to do
the reduction all in one go on the flattened mesh dim.

The redistribute algorithm currently operates by proposing a sequence of
collectives to perform. This change looks for multiple consecutive
reductions, and greedily tests if they have a flattened mesh dim/PG. If
they do, it replaces this plan with one that does the reduction all in
one step.

[ghstack-poisoned]

pytorch-bot · 2026-01-07T20:51:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/171913

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures

As of commit 6590dfa with merge base 3d2e7de ():

NEW FAILURES - The following jobs have failed:

Check Labels / Check labels (gh)
RuntimeError: Error checking labels: PR does not have required labels
Lint / lintrunner-noclang-partial / linux-job (gh)
>>> Lint for torch/distributed/tensor/_redistribute.py:
Lint / lintrunner-pyrefly-partial / linux-job (gh)
>>> Lint for torch/distributed/tensor/_redistribute.py:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…utive reductions Authored with claude code When there are multiple reductions that need to occur on multiple mesh dims, we will issue multiple collectives per mesh dim. When we want to do a reduction on multiple contiguous mesh dims, and a flattened dim of those contiguous dims exists (e.g., we have already paid for initializing PGs for the flattened dim), then it would be better to do the reduction all in one go on the flattened mesh dim. The redistribute algorithm currently operates by proposing a sequence of collectives to perform. This change looks for multiple consecutive reductions, and greedily tests if they have a flattened mesh dim/PG. If they do, it replaces this plan with one that does the reduction all in one step. ghstack-source-id: 7c4441a Pull-Request: #171913

github-actions · 2026-01-07T20:52:21Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

wconstab · 2026-01-07T21:05:32Z

+) -> list[_TransformInfo]:
+    """
+    Optimize transform_infos by merging consecutive all-reduce operations on
+    contiguous mesh dimensions when a flattened mesh/PG exists for those dimensions.


is it only possible to merge reductions on contiguous mesh dims?

further, if we had a mesh like [dp, pp, tp] and we sliced out spmd_mesh = parent[dp, tp] - the indices of dp, tp would be 0,1 and appear 'contiguous' to this code, but not actually be. is that a problem?

if we can leverage the coalesce of the layout, that will solve this case @wconstab mentioned here.

Trying to find a corner case where the contiguous assumption does not provide identity guarantee:

If we use graph-based DDP + FSDP + TP, for RMSNorm.weight we would have

param (Replicate, Shard, Replicate)

grad before reduction (Partial, Partial, Partial)

I guess reduction will still happen in order AR, RS, AR and result in DDP / TP ranks not having the same results.

I am pretty sure contiguous only is sound, but not complete (as wconstab is mentioning above). Are you worried about unsoundness here too?

I do worry about completeness. If a moderately complicated solution doesn't solve the problem, I would prefer we error out.

ezyang · 2026-01-07T21:33:30Z

Also, I don't care AT ALL about the code here (entirely claude coded), so if someone wants to redo it from scratch or commandeer, I am not trying to lick the cookie.

wconstab · 2026-01-12T17:50:02Z

I did take over this PR and replace it with #172121 FYI. closing this one

Update

6590dfa

[ghstack-poisoned]

pytorch-bot Bot added the ciflow/inductor label Jan 7, 2026

github-actions Bot requested review from SherlockNoMad, albanD, antoniojkim, bdhirsh and miladm January 7, 2026 20:51

ezyang mentioned this pull request Jan 7, 2026

DTensor must generate flattened PGs to avoid allreduce result inconsistency across Replicate when reducing over multiple mesh dims #171916

Closed

wconstab reviewed Jan 7, 2026

View reviewed changes

wconstab closed this Jan 12, 2026

tianyu-l mentioned this pull request Jan 14, 2026

[DTensor] Optimize redistribute by merging allreduces on flattened meshes #172119

Closed

github-actions Bot deleted the gh/ezyang/3233/head branch February 14, 2026 02:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DTensor] Optimize redistribute to use flattened mesh dims for consecutive reductions#171913

[DTensor] Optimize redistribute to use flattened mesh dims for consecutive reductions#171913
ezyang wants to merge 1 commit intogh/ezyang/3233/basefrom
gh/ezyang/3233/head

ezyang commented Jan 7, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jan 7, 2026

Uh oh!

wconstab Jan 7, 2026

Uh oh!

fduwjj Jan 7, 2026

Uh oh!

tianyu-l Jan 8, 2026

Uh oh!

ezyang Jan 8, 2026

Uh oh!

tianyu-l Jan 14, 2026 •

edited

Loading

Uh oh!

ezyang commented Jan 7, 2026

Uh oh!

wconstab commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ezyang commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/171913

❌ 3 New Failures

Uh oh!

github-actions Bot commented Jan 7, 2026

This PR needs a release notes: label

Uh oh!

wconstab Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

fduwjj Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

tianyu-l Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

ezyang Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

tianyu-l Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jan 7, 2026

Uh oh!

wconstab commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ezyang commented Jan 7, 2026 •

edited

Loading

pytorch-bot Bot commented Jan 7, 2026 •

edited

Loading

This PR needs a `release notes:` label

tianyu-l Jan 14, 2026 •

edited

Loading