[DTensor] single dim fix inplace op expansion by wconstab · Pull Request #172477 · pytorch/pytorch

wconstab · 2026-01-14T19:41:37Z

Stack from ghstack (oldest at bottom):

This enables the inplace filtering logic that skips strategies with
incompatible input placements.

Previously, inplace ops were able to generate expanded strategies
incompatible with the current input placements, which aren't allowed to
be modified by redistribution.

This enables the inplace filtering logic that skips strategies with incompatible input placements. Previously, inplace ops were able to generate expanded strategies incompatible with the current input placements, which aren't allowed to be modified by redistribution. [ghstack-poisoned]

pytorch-bot · 2026-01-14T19:41:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172477

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures

As of commit 8b18f74 with merge base 879fd46 ():

NEW FAILURES - The following jobs have failed:

trunk / linux-jammy-rocm-py3.10 / test (default, 1, 6, linux.rocm.gpu.gfx942.1) (gh)
Process completed with exit code 1.
trunk / linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.gfx942.1) (gh)
Process completed with exit code 1.
trunk / linux-jammy-rocm-py3.10 / test (default, 3, 6, linux.rocm.gpu.gfx942.1) (gh)
Process completed with exit code 1.
trunk / linux-jammy-rocm-py3.10 / test (default, 4, 6, linux.rocm.gpu.gfx942.1) (gh)
Process completed with exit code 1.
trunk / linux-jammy-rocm-py3.10 / test (default, 5, 6, linux.rocm.gpu.gfx942.1) (gh)
Process completed with exit code 1.
trunk / linux-jammy-rocm-py3.10 / test (default, 6, 6, linux.rocm.gpu.gfx942.1) (gh)
Process completed with exit code 1.
trunk / linux-jammy-rocm-py3.10 / test (distributed, 1, 3, linux.rocm.gpu.gfx942.4) (gh)
Process completed with exit code 1.
trunk / linux-jammy-rocm-py3.10 / test (distributed, 2, 3, linux.rocm.gpu.gfx942.4) (gh)
Process completed with exit code 1.
trunk / linux-jammy-rocm-py3.10 / test (distributed, 3, 3, linux.rocm.gpu.gfx942.4) (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This enables the inplace filtering logic that skips strategies with incompatible input placements. Previously, inplace ops were able to generate expanded strategies incompatible with the current input placements, which aren't allowed to be modified by redistribution. [ghstack-poisoned]

zpcore · 2026-01-15T06:20:56Z

+
+**Left-to-Right Evaluation Order**
+
+DTensor evaluates Partial placements in **left-to-right order** (i.e., mesh dimension 0 first,


I checked the redistribution code, left-to-right order is not completely correct.
The Partial-> Replicate can be right-to-left order:

pytorch/torch/distributed/tensor/_redistribute.py

Lines 749 to 757 in ee562d9

if current != target:

transform_infos.append(

_TransformInfo(

mesh_dim=mesh_dim,

src_dst_placements=(current, target),

logical_shape=mesh_dims_to_logical_shape[mesh_dim],

)

)

current_placements[mesh_dim] = target

or left-to-right:

pytorch/torch/distributed/tensor/_redistribute.py

Lines 765 to 773 in ee562d9

if current != target:

transform_infos.append(

_TransformInfo(

mesh_dim=mesh_dim,

src_dst_placements=(current, target),

logical_shape=mesh_dims_to_logical_shape[mesh_dim],

)

)

current_placements[mesh_dim] = target

But in general, it should be either one.

As long as we have Shard in src placement, we can trigger right-to-left:

from torch.utils._debug_mode import DebugMode class DistributeWithPartialTest(DTensorTestBase): @property def world_size(self) -> int: return 8 def _extract_redistribute_trace_from_debug_mode(self, s: str) -> str: import re match = re.search(r"trace:\s*(.*)\)", s) if match: trace_str = match.group(1) return trace_str else: return "" @with_comms def test_tmp(self): mesh = init_device_mesh(self.device_type, (2, 2, 2)) input_data = torch.randn((8, 8, 8), device=self.device_type) dt = DTensor.from_local(input_data, mesh, (Partial('sum'), Shard(0), Partial('max'))) with DebugMode(record_torchfunction=False) as debug_mode: dt2 = dt.redistribute(mesh, (Replicate(), Replicate(), Replicate())) trace_str = self._extract_redistribute_trace_from_debug_mode(debug_mode.debug_string()) print(trace_str)

The redistribution path is: P(sum)S(0)P(max)->P(sum)S(0)R->P(sum)RR->RRR. This made me think that there is a bug with the greedy redistribution algorithm when handling Partial with variant. We can't arbitrarily switch between left-to-right or right-to-left order. I think right-to-left should be the correct order to handle Partial->Replicate.

That sounds good to me. However don't move too fast on it, we are discussing just banning mixed partial types existing in the same placement and then we don't need to define an
order

oh, this PR was not even supposed to include the changes to partial order. it was a rebasing mistake. I will remove that code. I have another PR up now anyway that partially bans mixed partials. If people are OK with that direction, i'll extend that and ensure we don't allow mixed partials at all parts of the DTensor stack. then this ordering stuff becomes unnecessary.

Yes, "banning mixed Partial" sounds good to me!

zpcore · 2026-01-15T07:02:55Z

+            if isinstance(placement, Partial):
+                src_partial_ops[mesh_dim] = placement.reduce_op
+
+        if len(src_partial_ops) > 1:


Aha, I see. I left the comment in README file too early. This PR is enforcing the left-to-right order when there exists multiple Partial variants.

zpcore · 2026-01-15T07:16:36Z

+
+                # If some partials are being reduced while others are kept,
+                # we need to reduce ALL partials first, then re-partition
+                if dst_partial_dims and dst_partial_dims != src_partial_dims:


If the condition dst_partial_dims and dst_partial_dims != src_partial_dims doesn't hold, then we will use the default greedy code to handle partial, which can either be left-to-right or reverse. We also need to enforce that order.

This enables the inplace filtering logic that skips strategies with incompatible input placements. Previously, inplace ops were able to generate expanded strategies incompatible with the current input placements, which aren't allowed to be modified by redistribution. [ghstack-poisoned]

zpcore

LGTM!

This enables the inplace filtering logic that skips strategies with incompatible input placements. Previously, inplace ops were able to generate expanded strategies incompatible with the current input placements, which aren't allowed to be modified by redistribution. ghstack-source-id: 0a32d7e Pull Request resolved: pytorch/pytorch#172477

This enables the inplace filtering logic that skips strategies with incompatible input placements. Previously, inplace ops were able to generate expanded strategies incompatible with the current input placements, which aren't allowed to be modified by redistribution. ghstack-source-id: a1532d7 Pull Request resolved: pytorch/pytorch#172477

This enables the inplace filtering logic that skips strategies with incompatible input placements. Previously, inplace ops were able to generate expanded strategies incompatible with the current input placements, which aren't allowed to be modified by redistribution. [ghstack-poisoned]

wconstab · 2026-01-20T04:16:57Z

@pytorchbot merge

pytorchmergebot · 2026-01-20T04:19:48Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2026-01-20T04:41:14Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-jammy-py3.10-gcc11 / test (distributed, 2, 2, lf.linux.2xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

This enables the inplace filtering logic that skips strategies with incompatible input placements. Previously, inplace ops were able to generate expanded strategies incompatible with the current input placements, which aren't allowed to be modified by redistribution. [ghstack-poisoned]

wconstab · 2026-01-20T19:31:04Z

@pytorchbot merge -i

pytorchmergebot · 2026-01-20T19:33:25Z

Merge started

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2026-01-20T20:28:33Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 4, 6, linux.rocm.gpu.gfx942.1)

Details for Dev Infra team

Raised by workflow job

wconstab · 2026-01-20T20:46:20Z

@pytorchbot merge -i

pytorchmergebot · 2026-01-20T20:48:34Z

Merge started

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This enables the inplace filtering logic that skips strategies with incompatible input placements. Previously, inplace ops were able to generate expanded strategies incompatible with the current input placements, which aren't allowed to be modified by redistribution. ghstack-source-id: 7f3bdea Pull Request resolved: pytorch/pytorch#172477

This enables the inplace filtering logic that skips strategies with incompatible input placements. Previously, inplace ops were able to generate expanded strategies incompatible with the current input placements, which aren't allowed to be modified by redistribution. ghstack-source-id: 1a85756 Pull Request resolved: pytorch/pytorch#172477

pytorch-bot Bot added ciflow/inductor release notes: distributed (dtensor) release notes category labels Jan 14, 2026

wconstab requested review from pianpwk, weifengpy and zpcore and removed request for pianpwk January 14, 2026 21:43

zpcore reviewed Jan 15, 2026

View reviewed changes

zpcore approved these changes Jan 16, 2026

View reviewed changes

pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 20, 2026

pytorchmergebot added the merging label Jan 20, 2026

pytorchmergebot removed the merging label Jan 20, 2026

pytorchmergebot added the merging label Jan 20, 2026

pytorchmergebot removed the merging label Jan 20, 2026

pytorchmergebot added the merging label Jan 20, 2026

pytorchmergebot added the Merged label Jan 20, 2026

pytorchmergebot closed this in c701f4d Jan 20, 2026

pytorchmergebot removed the merging label Jan 20, 2026

github-actions Bot deleted the gh/wconstab/499/head branch February 20, 2026 02:22


		Left-to-Right Evaluation Order

		DTensor evaluates Partial placements in left-to-right order (i.e., mesh dimension 0 first,

	if current != target:
	transform_infos.append(
	_TransformInfo(
	mesh_dim=mesh_dim,
	src_dst_placements=(current, target),
	logical_shape=mesh_dims_to_logical_shape[mesh_dim],
	)
	)
	current_placements[mesh_dim] = target

Conversation

wconstab commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172477

❌ 9 New Failures

Uh oh!

zpcore Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

zpcore Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

wconstab Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

wconstab Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

zpcore Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

zpcore Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

zpcore Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

zpcore left a comment

Choose a reason for hiding this comment

Uh oh!

wconstab commented Jan 20, 2026

Uh oh!

pytorchmergebot commented Jan 20, 2026

Merge started

Uh oh!

pytorchmergebot commented Jan 20, 2026

Merge failed

Uh oh!

wconstab commented Jan 20, 2026

Uh oh!

pytorchmergebot commented Jan 20, 2026

Merge started

Uh oh!

pytorchmergebot commented Jan 20, 2026

Merge failed

Uh oh!

wconstab commented Jan 20, 2026

Uh oh!

pytorchmergebot commented Jan 20, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wconstab commented Jan 14, 2026 •

edited

Loading

pytorch-bot Bot commented Jan 14, 2026 •

edited

Loading