reshape or view ops validate evenly and unevenly sharded dtensor by dayanandav · Pull Request #161161 · pytorch/pytorch

dayanandav · 2025-08-21T13:27:31Z

For view/reshape ops validate evenly or unevenly sharded dtensor before getting to runtime dispatch, thrown more specific error before getting to runtime dispatch as implemented here #149764

Fixes #161147

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @tianyu-l @XilunWu @SherlockNoMad

pytorch-bot · 2025-08-21T13:27:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161161

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Driver update on H100 and A100 instances

✅ No Failures

As of commit 482a007 with merge base 31d5c67 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

dayanandav · 2025-08-21T13:28:28Z

@pytorchbot label "topic: not user facing" "module: dtensor"

For view/reshape ops validate evenly or unevenly sharded dtensor before getting to runtime dispatch, thrown more specific error before getting to runtime dispatch as implemented here pytorch#149764

can_shard_dim was disable cause exception for replicate placement

resolve conflicts

conflict resolved

wconstab · 2025-09-03T14:06:56Z

    def test_illegal_views(self):
        device_mesh = self.build_device_mesh()
+        # 1D mesh [6] (see above)
+        tensor = torch.randn((6, 252))


Let's use a tensor of 2,4 size for example, values from range(8).

Initially shard(1)

Rank0
((0,1),(4,5))
Rank1
((2,3),(6,7))

Flattened tensor then shard(0)
Rank0
(0,1,2,3)
Rank1
(4,5,6,7)
Since the local values per rank changed, the view is not valid. Do you agree?

yes, i agree with even values can be shard across dim 0/1 after view.

I don't think I explained it well- my example was supposed to show that this PR is incorrect and this is not a valid view.

It would only be valid if the local tensors did not change. Since they do change it requires a redistribution, which we don't allow for a view op.

@wconstab
am facing problem "Attempted to flatten sharded dimension 1" error with the below one of my view operation for my backend, i hope below redistribution is valid and this is not handle properly with current design #161395 patch, so i raised this PR and reported #161147 with simplified steps to address this problem.

`
new_mesh = init_device_mesh("cuda", [4], mesh_dim_names=["tp"])
x=torch.randn((8, 4, 8, 4),dtype=torch.float32,device="cuda")

d_x = DTensor.from_local(x, device_mesh=new_mesh, placements=[Replicate()])
d_x = d_x.redistribute(device_mesh=new_mesh, placements=[Shard(dim=1)])
d_x = d_x.reshape(1024) ##8x4x8x4=1024
`

@wconstab do you suggest any alternative solution to deal with above problem under view operation?

@weifengpy @XilunWu @ezyang have been looking into better view support in general. I'm not sure if they have an update that would help your case. I'd also like one of them to double check my conclusion, but if it's correct, we can't land this PR as is.

I can take a closer look at this PR

@dayanandav are you proposing to land the PR or just show it as a repro? I tested the PR, it redistributes local_tensors to achieve view(-1): Shard(1) -> view(-1) -> Replicate()

To align with torch.Tensor.view, we don't allow resitribution/communication in DTensor.view

If you really need redistribution, could you call redistribute explicitly to convert from Shard(1) to Replicate(), instead of letting view do it?

Do you have a follow up op that "reverts" the view and make it leggal? Take batch dim flatenning/unflattening as an example, (2, 3, 4) -> view -> (2 x 3, 4) -> view -> (2, 3, 4) is the case I am trying to support

Conflict changes removed

ezyang

making sure this is blocked

Lint issue fixes

dayanandav · 2025-09-26T05:32:22Z

@weifengpy @ezyang progress on review ??

…Tensor)" nn.Linear(DTensor) got decomposed into view on DTensor, with error ` RuntimeError( [rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')` still learning from a few PRs * #149764 * #161950 * #161161 cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]

…from nn.Linear(DTensor)" nn.Linear(DTensor) got decomposed into view on DTensor, with error ` RuntimeError( [rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')` still learning from a few PRs * #149764 * #161950 * #161161 cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]

…Tensor)" nn.Linear(DTensor) got decomposed into view on DTensor, with error ` RuntimeError( [rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')` still learning from a few PRs * #149764 * #161950 * #161161 cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]

…from nn.Linear(DTensor)" nn.Linear(DTensor) got decomposed into view on DTensor, with error ` RuntimeError( [rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')` still learning from a few PRs * #149764 * #161950 * #161161 cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]

…attening" for `F.linear(inputs, weight)`, we have batch flattening and unflattening * batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim) * batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim) when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors ``` [rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding [rank1]: in_dim = get_in_dim_to_shard(cmd) [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^ [rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard [rank1]: raise RuntimeError( [rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.') ``` this PR add hierachical placements to support batch flattening and unflattening reference DTensor view PRs * #149764 * #161950 * #161161 cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]

Fix lint issue

…ttening and unflattening" for `F.linear(inputs, weight)`, we have batch flattening and unflattening * batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim) * batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim) when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors ``` [rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding [rank1]: in_dim = get_in_dim_to_shard(cmd) [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^ [rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard [rank1]: raise RuntimeError( [rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.') ``` this PR add hierachical placements to support batch flattening and unflattening reference DTensor view PRs * #149764 * #161950 * #161161 cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]