[DTensor] Fix default_strategy and rename for clarity by wconstab · Pull Request #158490 · pytorch/pytorch

wconstab · 2025-07-16T21:54:04Z

Stack from ghstack (oldest at bottom):

Fixes several bugs in the original.

foremost, fixes a serious bug where we returned incorrect strategies
by mixing input_specs that were frozen from
select_strategy.strategies[0] with output_specs that varied across
select_strategy.strategies[0..N] (e.g. we could create a nonsense
strategy like input:Shard(0) output(Replicate) for an op like clone
fixes the redistribute costs: they should not actually be 0, they
should be the cost of redistributing our single input from another
strategy to the current strategy, in our list of output strategies
adds a note, wondering if we should have just literally returned the
input strategy instead of creating this new object
Currently, using default_strategy is incorrect becuase it maps 'self'
tensor's strategies directly onto 'src' tensor without accounting for
the fact that copy_ supports broadcasting a smaller rank tensor into a
larger one.

Separates out copy_ op from default strategy, adds missing test case,
but does not fix the underlying issue with copy_, leaves that for future
PR

Renames to propagate_single_input_strategy since that's more
descriptive

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @d4l3k @pragupta

Fixes several bugs in the original. - foremost, fixes a serious bug where we returned incorrect strategies by mixing input_specs that were frozen from select_strategy.strategies[0] with output_specs that varied across select_strategy.strategies[0..N] (e.g. we could create a nonsense strategy like input:Shard(0) output(Replicate) for an op like clone - fixes the redistribute costs: they should not actually be 0, they should be the cost of redistributing our single input from another strategy to the current strategy, in our list of output strategies - adds a note, wondering if we should have just literally returned the input strategy instead of creating this new object Renames to `propagate_single_input_strategy` since that's more descriptive [ghstack-poisoned]

pytorch-bot · 2025-07-16T21:54:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158490

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Unrelated Failures

As of commit 4f6998f with merge base 1e86fa2 ():

NEW FAILURE - The following job has failed:

trunk / macos-py3-arm64 / test (mps, 1, 1, macos-m1-13) (gh)
'test/nn/test_pooling.py::TestPoolingNNDeviceTypeMPS::test_maxpool3d_non_square_backward_mps'

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-jammy-rocm-py3.10 / test (default, 1, 2, linux.rocm.gpu.2) (gh) (similar failure)
test_nn.py::TestNNDeviceTypeCUDA::test_avg_pool_large_tensor2_cuda

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / cuda12.8-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
torchrec_dlrm

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable) (gh) (#153987)
MISSING REGRESSION TEST

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Fixes several bugs in the original. - foremost, fixes a serious bug where we returned incorrect strategies by mixing input_specs that were frozen from select_strategy.strategies[0] with output_specs that varied across select_strategy.strategies[0..N] (e.g. we could create a nonsense strategy like input:Shard(0) output(Replicate) for an op like clone - fixes the redistribute costs: they should not actually be 0, they should be the cost of redistributing our single input from another strategy to the current strategy, in our list of output strategies - adds a note, wondering if we should have just literally returned the input strategy instead of creating this new object Renames to `propagate_single_input_strategy` since that's more descriptive ghstack-source-id: 613b2d7 Pull Request resolved: #158490

zpcore · 2025-07-16T22:16:30Z

torch/distributed/tensor/_ops/_tensor_ops.py

        aten.zero_.default,
    ]
-)(default_strategy)
+)(propagate_single_input_strategy)


Doesn't look like all ops here only have one input arg, e.g., aten.fill_.Scalar.

i was wondering about that. but fill_.scalar should still only have one 'tensor' input, so i think my statement still holds and I just need to find a better way to check the schema?

trying this instead
assert len([s for s in op_schema.args_schema if isinstance(s, OpStrategy)]) == 1

Yes, one Tensor input should hold. LGTM now!

Oh, wait, func: copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!) still have two.

you're right. hmm. i don't really like using this strategy func for an op that has 2 tensor inputs.

the way i interpret that is, we actually have 2 tensor inputs, each with different strategies. Let's pick the first one arbitrarily, ignore the second one, and then create an op strategy where we use the first input's strategy in place of the second input.

Will this work for to_copy? I am suspicious that it only works in some cases.

Counter example:
If we feed an input tensor of shape (2,) into copy_, we can 'broadcast' it implicitly.

>>> import torch >>> x = torch.ones([2,2]) >>> y = torch.ones([2,]) * 3 >>> x.copy_(y) tensor([[3., 3.], [3., 3.]])

If 'self' tensor (our first tensor input) has a sharding on dim 1, we can NOT use that sharding to describe 'src' tensor, whihch only has dim0. So it is not correct to use this strategy for copy_ in general.

it also appears that we do not have a unit test for copy_ in the first place, but i added one that confirms that the assert len strategies == 1 will fail and we need a fix.

Fixes several bugs in the original. - foremost, fixes a serious bug where we returned incorrect strategies by mixing input_specs that were frozen from select_strategy.strategies[0] with output_specs that varied across select_strategy.strategies[0..N] (e.g. we could create a nonsense strategy like input:Shard(0) output(Replicate) for an op like clone - fixes the redistribute costs: they should not actually be 0, they should be the cost of redistributing our single input from another strategy to the current strategy, in our list of output strategies - adds a note, wondering if we should have just literally returned the input strategy instead of creating this new object Renames to `propagate_single_input_strategy` since that's more descriptive cc H-Huang awgu wanchaol fegin fduwjj wz337 d4l3k [ghstack-poisoned]

wanchaol

Could you explain more about the bug? I thought it could be easily fixed by dynamically generate redistribute_costs. This changes the intention of default_strategy and I wonder if it's necessary

wanchaol · 2025-07-17T17:56:27Z

torch/distributed/tensor/_ops/_tensor_ops.py

-                mesh=select_strategy.mesh,
-                placements=select_strategy.strategies[0].output_spec.placements,
-                tensor_meta=select_strategy.strategies[0].output_spec.tensor_meta,
+def propagate_single_input_strategy(op_schema: OpSchema) -> StrategyType:


The default_strategy is not a single_input strategy, if you look at the old code, it actually loop through all the args_schema and append a new DTensorSpec for each argument, so it is meant be able to handle multiple input arguments.

yes, i realized i was changing the semantic of the function. However, it was not clear if the original design was fully thought-through. 5/6 ops using it were single-input, and the other op that has multi inputs was using it incorrectly by accident.

Can you convince me that there are ops that we want to use the multi-input default strategy for and would be correct? If not, lets just make it simpler, and avoid the possibility of using it incorrectly by accident.

Oh actually i think it's probably fine, it was added in this PR ae86e8f

Before that default strategy does not handle multiple inputs anyways.

wanchaol · 2025-07-17T18:00:30Z

torch/distributed/tensor/_ops/_tensor_ops.py

+                input_specs=[
+                    DTensorSpec(
+                        mesh=first_input_strategy.mesh,
+                        placements=strategy.output_spec.placements,


The intention of default strategy means to let every argument follow the input argument strategy, but here it looks like you only assume the op have a single input. Probably make sense for the renaming you have, but it was not the same intended behavior as before

I found that out of all the ops that we used 'default_strategy' for, only copy_ was "benefitting" from applying the first input strategy to the second input.

however, for copy_, this supposed benefit was actually a hidden correctness bug. It works for cases where src,self are same dim, but it fails for cases where src is smaller dim than self and requires broadcasting. I added a new test case that will fail with the current default_strategy applied to copy.

Based on this, and since we do not have an example of an op that has multiple inputs that can safely share the first input's strategy, I decided its better to just simplify the helper to a narrower purpose. Wdyt?

wanchaol · 2025-07-17T18:01:48Z

torch/distributed/tensor/_ops/_tensor_ops.py

-                tensor_meta=strategy.output_spec.tensor_meta,
-            ),
-            input_specs=input_specs,
-            redistribute_cost=redistribute_cost,


aren't the real fix here should simply be generating the redistribute_cost here by calling generate_redistribute_costs?

this fixes only the second bug in my PR desc. I'll explain the first bug more in a comment below.

wconstab · 2025-07-17T19:07:05Z

Could you explain more about the bug? I thought it could be easily fixed by dynamically generate redistribute_costs. This changes the intention of default_strategy and I wonder if it's necessary

Yes, the first bug in my PR desc is related to the input specs not the redistribute costs.

(e.g. we could create a nonsense strategy like input:Shard(0) output(Replicate) for an op like clone

i found empirically that we were creating a strategy for Clone() where input strategy was Shard(0) and output strategy was Replicate(). I realized the reason was that we were for-looping over input strategies but hardcoding output strategy to be the one from the first input strategy.

Fixes several bugs in the original. - foremost, fixes a serious bug where we returned incorrect strategies by mixing input_specs that were frozen from select_strategy.strategies[0] with output_specs that varied across select_strategy.strategies[0..N] (e.g. we could create a nonsense strategy like input:Shard(0) output(Replicate) for an op like clone - fixes the redistribute costs: they should not actually be 0, they should be the cost of redistributing our single input from another strategy to the current strategy, in our list of output strategies - adds a note, wondering if we should have just literally returned the input strategy instead of creating this new object Renames to `propagate_single_input_strategy` since that's more descriptive cc H-Huang awgu wanchaol fegin fduwjj wz337 d4l3k [ghstack-poisoned]

Fixes several bugs in the original. - foremost, fixes a serious bug where we returned incorrect strategies by mixing input_specs that were frozen from select_strategy.strategies[0] with output_specs that varied across select_strategy.strategies[0..N] (e.g. we could create a nonsense strategy like input:Shard(0) output(Replicate) for an op like clone - fixes the redistribute costs: they should not actually be 0, they should be the cost of redistributing our single input from another strategy to the current strategy, in our list of output strategies - adds a note, wondering if we should have just literally returned the input strategy instead of creating this new object - Currently, using default_strategy is incorrect becuase it maps 'self' tensor's strategies directly onto 'src' tensor without accounting for the fact that copy_ supports broadcasting a smaller rank tensor into a larger one. Separates out copy_ op from default strategy, adds missing test case, but does not fix the underlying issue with copy_, leaves that for future PR Renames to `propagate_single_input_strategy` since that's more descriptive [ghstack-poisoned]

XilunWu

The idea that we need a single_operand_strategy looks good to me. Would like discuss more:

The pattern of "all DTensor input should follow a given input's sharding" seems common. I believe we still need it (of course it needs fix) with a new name. But I'm okay with removing it in this PR and add a correct version back in another since the only use case right now is copy_ and copy_ needs to handle broadcast.

BTW, do some foreach_* ops need single-input propagate and follow-some-input propagate as well?

XilunWu · 2025-07-17T21:50:30Z

torch/distributed/tensor/_ops/_tensor_ops.py

+    # Note: this may be a complete waste of work, becuase it should be equivalent to
+    # `return first_input_strategy` (unless creating a deep copy is important for some reason)


I don't think we're modifying OpSpec anywhere so it should be safe to directly return.

XilunWu · 2025-07-17T22:02:46Z

torch/distributed/tensor/_ops/_tensor_ops.py

+    num_tensor_args = 2
+    first_input_strategy = op_schema.args_schema[0]
+    assert isinstance(first_input_strategy, OpStrategy)
+    return OpStrategy(
+        [
+            OpSpec(
+                output_specs=DTensorSpec(
+                    mesh=first_input_strategy.mesh,
+                    placements=strategy.output_spec.placements,
+                    tensor_meta=strategy.output_spec.tensor_meta,
+                ),
+                input_specs=[
+                    DTensorSpec(
+                        mesh=first_input_strategy.mesh,
+                        placements=strategy.output_spec.placements,
+                        tensor_meta=strategy.output_spec.tensor_meta,
+                    )
+                    for _ in range(num_tensor_args)
+                ],
+                redistribute_cost=[
+                    generate_redistribute_costs(
+                        first_input_strategy, strategy.output_spec
+                    )
+                    for _ in range(num_tensor_args)
+                ],
+            )
+            for strategy in first_input_strategy.strategies
+        ]
+    )


IIUC this doesn't address the broadcast issue either. Do we address it in another PR?

see the next PR in the stack

wconstab · 2025-07-17T22:17:03Z

BTW, do some foreach_* ops need single-input propagate and follow-some-input propagate as well?

foreach ops are different in that they are many to many, not many to one. the previous default_strategy was many to one. (one output). It would be bad to use 'default_strategy' for foreach, becuase each output needs to follow its own input, not the first input.

wconstab · 2025-07-17T22:17:44Z

The pattern of "all DTensor input should follow a given input's sharding" seems common. I believe we still need it (of course it needs fix) with a new name. But I'm okay with removing it in this PR and add a correct version back in another since the only use case right now is copy_ and copy_ needs to handle broadcast.

can you give an example of an op where this pattern is true?

XilunWu

The strategy part LGTM. Didn't review the copy_ part because they're override in next PR.

wanchaol · 2025-07-17T23:00:40Z

test/distributed/tensor/test_tensor_ops.py

+
+    # @pytest.mark.xfail
+    # @with_comms
+    # def test_copy_broadcast(self):


why this being comment out? I thought you are marking xfails

xfail did not work haha. so i commented it. its uncommented and passing in the next PR.

wanchaol · 2025-07-17T23:03:16Z

torch/distributed/tensor/_ops/_tensor_ops.py

-                mesh=select_strategy.mesh,
-                placements=select_strategy.strategies[0].output_spec.placements,
-                tensor_meta=select_strategy.strategies[0].output_spec.tensor_meta,
+def propagate_single_input_strategy(op_schema: OpSchema) -> StrategyType:


Oh actually i think it's probably fine, it was added in this PR ae86e8f

Before that default strategy does not handle multiple inputs anyways.

Renamed after changing the semantic upstream pytorch/pytorch#158490

wconstab · 2025-07-18T00:31:45Z

@pytorchbot merge

pytorchmergebot · 2025-07-18T00:33:39Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

The previous strategy directly used 'self' input strategy for 'src' input. The fixed strategy correctly maps the self dim to src dim so that it works even if the src input is broadcast. E.g. for this program, broadcasting will occur on dims 0,1,3 of self. ``` self = torch.ones((2,3,4,5)) src = torch.ones((4,1)) self.copy_(src) ``` These are the correct sharding combinations: | self | src | |-------|------| | Shard(0) | Replicate() | | Shard(1) | Replicate() | | Shard(2) | Shard(0) | | Shard(3) | Shard(1) | Pull Request resolved: #158538 Approved by: https://github.com/zpcore, https://github.com/XilunWu, https://github.com/wanchaol ghstack dependencies: #158495, #158490

clee2000 · 2025-07-18T16:43:11Z

@pytorchbot revert -m "broke lint? GH job link HUD commit link" -c landrace

cc @seemethere tentatively marking this as landrace but the misspelling does exist in the PR as well

pytorchmergebot · 2025-07-18T16:45:23Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

This reverts commit 7b05bdd. Reverted #158538 on behalf of https://github.com/clee2000 due to broke lint? [GH job link](https://github.com/pytorch/pytorch/actions/runs/16361950974/job/46231492581) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/d8b084312b54e97bdbaf6a178fe2fc628a23243b) ([comment](#158490 (comment)))

This reverts commit d8b0843. Reverted #158490 on behalf of https://github.com/clee2000 due to broke lint? [GH job link](https://github.com/pytorch/pytorch/actions/runs/16361950974/job/46231492581) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/d8b084312b54e97bdbaf6a178fe2fc628a23243b) ([comment](#158490 (comment)))

pytorchmergebot · 2025-07-18T16:45:40Z

@wconstab your PR has been successfully reverted.

Fixes several bugs in the original. - foremost, fixes a serious bug where we returned incorrect strategies by mixing input_specs that were frozen from select_strategy.strategies[0] with output_specs that varied across select_strategy.strategies[0..N] (e.g. we could create a nonsense strategy like input:Shard(0) output(Replicate) for an op like clone - fixes the redistribute costs: they should not actually be 0, they should be the cost of redistributing our single input from another strategy to the current strategy, in our list of output strategies - adds a note, wondering if we should have just literally returned the input strategy instead of creating this new object - Currently, using default_strategy is incorrect becuase it maps 'self' tensor's strategies directly onto 'src' tensor without accounting for the fact that copy_ supports broadcasting a smaller rank tensor into a larger one. Separates out copy_ op from default strategy, adds missing test case, but does not fix the underlying issue with copy_, leaves that for future PR Renames to `propagate_single_input_strategy` since that's more descriptive cc H-Huang awgu wanchaol fegin fduwjj wz337 d4l3k pragupta [ghstack-poisoned]

pytorchmergebot · 2025-07-18T17:40:44Z

Starting merge as part of PR stack under #158538

Renamed after changing the semantic upstream pytorch/pytorch#158490

pytorchmergebot · 2025-07-18T23:38:29Z

Starting merge as part of PR stack under #158538

The previous strategy directly used 'self' input strategy for 'src' input. The fixed strategy correctly maps the self dim to src dim so that it works even if the src input is broadcast. E.g. for this program, broadcasting will occur on dims 0,1,3 of self. ``` self = torch.ones((2,3,4,5)) src = torch.ones((4,1)) self.copy_(src) ``` These are the correct sharding combinations: | self | src | |-------|------| | Shard(0) | Replicate() | | Shard(1) | Replicate() | | Shard(2) | Shard(0) | | Shard(3) | Shard(1) | Pull Request resolved: #158538 Approved by: https://github.com/zpcore, https://github.com/XilunWu, https://github.com/wanchaol ghstack dependencies: #158490

Renamed after changing the semantic upstream pytorch/pytorch#158490

pytorch-bot bot added the ciflow/inductor label Jul 16, 2025

pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jul 16, 2025

wconstab added the release notes: distributed (dtensor) release notes category label Jul 16, 2025

wconstab mentioned this pull request Jul 16, 2025

[DTensor] Document redistribute_costs #158495

Closed

zpcore reviewed Jul 16, 2025

View reviewed changes

wconstab added 3 commits July 16, 2025 15:26

wconstab mentioned this pull request Jul 17, 2025

[DTensor] fix copy_ strategy #158538

Closed

wanchaol requested changes Jul 17, 2025

View reviewed changes

wconstab added 2 commits July 17, 2025 12:38

XilunWu reviewed Jul 17, 2025

View reviewed changes

XilunWu approved these changes Jul 17, 2025

View reviewed changes

wanchaol approved these changes Jul 17, 2025

View reviewed changes

wconstab added a commit to meta-pytorch/autoparallel that referenced this pull request Jul 17, 2025

Fix upstream api breakage (default_strategy)

ddb87d1

Renamed after changing the semantic upstream pytorch/pytorch#158490

wconstab mentioned this pull request Jul 17, 2025

Fix upstream api breakage (default_strategy) meta-pytorch/autoparallel#41

Merged

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 18, 2025

pytorchmergebot added the merging label Jul 18, 2025

pytorchmergebot closed this in d8b0843 Jul 18, 2025

pytorchmergebot added the Merged label Jul 18, 2025

pytorchmergebot removed the merging label Jul 18, 2025

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Jul 18, 2025

pytorchmergebot reopened this Jul 18, 2025

wconstab added a commit to meta-pytorch/autoparallel that referenced this pull request Jul 18, 2025

Fix upstream api breakage (default_strategy)

1c39aa7

Renamed after changing the semantic upstream pytorch/pytorch#158490

pytorchmergebot closed this in 36bddcd Jul 18, 2025

fmassa pushed a commit to meta-pytorch/autoparallel that referenced this pull request Jul 19, 2025

Fix upstream api breakage (default_strategy) (#41)

b53ad10

Renamed after changing the semantic upstream pytorch/pytorch#158490

github-actions bot deleted the gh/wconstab/429/head branch August 18, 2025 02:21

		# Note: this may be a complete waste of work, becuase it should be equivalent to
		# `return first_input_strategy` (unless creating a deep copy is important for some reason)

Conversation

wconstab commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158490

❌ 1 New Failure, 3 Unrelated Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wanchaol left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wconstab commented Jul 17, 2025

Uh oh!

XilunWu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wconstab commented Jul 17, 2025

Uh oh!

wconstab commented Jul 17, 2025

Uh oh!

XilunWu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wconstab commented Jul 18, 2025

Uh oh!

pytorchmergebot commented Jul 18, 2025

Merge started

Uh oh!

clee2000 commented Jul 18, 2025

Uh oh!

pytorchmergebot commented Jul 18, 2025

Uh oh!

pytorchmergebot commented Jul 18, 2025

Uh oh!

pytorchmergebot commented Jul 18, 2025

Uh oh!

pytorchmergebot commented Jul 18, 2025

Uh oh!

wconstab commented Jul 16, 2025 •

edited

Loading

pytorch-bot bot commented Jul 16, 2025 •

edited

Loading