Skip to content

reshape or view ops validate evenly and unevenly sharded dtensor#161161

Closed
dayanandav wants to merge 9 commits intopytorch:mainfrom
dayanandav:pr_161147
Closed

reshape or view ops validate evenly and unevenly sharded dtensor#161161
dayanandav wants to merge 9 commits intopytorch:mainfrom
dayanandav:pr_161147

Conversation

@dayanandav
Copy link
Copy Markdown
Contributor

@dayanandav dayanandav commented Aug 21, 2025

For view/reshape ops validate evenly or unevenly sharded dtensor before getting to runtime dispatch, thrown more specific error before getting to runtime dispatch as implemented here #149764

Fixes #161147

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @tianyu-l @XilunWu @SherlockNoMad

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Aug 21, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161161

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 482a007 with merge base 31d5c67 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot Bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Aug 21, 2025
@dayanandav
Copy link
Copy Markdown
Contributor Author

@pytorchbot label "topic: not user facing" "module: dtensor"

@pytorch-bot pytorch-bot Bot added module: dtensor distributed tensor tag topic: not user facing topic category labels Aug 21, 2025
Comment thread torch/distributed/tensor/_ops/_view_ops.py Outdated
For view/reshape ops validate evenly or unevenly sharded
dtensor before getting to runtime dispatch, thrown more
specific error before getting to runtime dispatch as
implemented here pytorch#149764
can_shard_dim was disable cause exception for replicate placement
resolve conflicts
conflict resolved
def test_illegal_views(self):
device_mesh = self.build_device_mesh()
# 1D mesh [6] (see above)
tensor = torch.randn((6, 252))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a tensor of 2,4 size for example, values from range(8).

Initially shard(1)

Rank0
((0,1),(4,5))
Rank1
((2,3),(6,7))

Flattened tensor then shard(0)
Rank0
(0,1,2,3)
Rank1
(4,5,6,7)
Since the local values per rank changed, the view is not valid. Do you agree?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i agree with even values can be shard across dim 0/1 after view.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I explained it well- my example was supposed to show that this PR is incorrect and this is not a valid view.

It would only be valid if the local tensors did not change. Since they do change it requires a redistribution, which we don't allow for a view op.

Copy link
Copy Markdown
Contributor Author

@dayanandav dayanandav Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wconstab
am facing problem "Attempted to flatten sharded dimension 1" error with the below one of my view operation for my backend, i hope below redistribution is valid and this is not handle properly with current design #161395 patch, so i raised this PR and reported #161147 with simplified steps to address this problem.

`
new_mesh = init_device_mesh("cuda", [4], mesh_dim_names=["tp"])
x=torch.randn((8, 4, 8, 4),dtype=torch.float32,device="cuda")

d_x = DTensor.from_local(x, device_mesh=new_mesh, placements=[Replicate()])
d_x = d_x.redistribute(device_mesh=new_mesh, placements=[Shard(dim=1)])
d_x = d_x.reshape(1024) ##8x4x8x4=1024
`

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wconstab do you suggest any alternative solution to deal with above problem under view operation?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weifengpy @XilunWu @ezyang have been looking into better view support in general. I'm not sure if they have an update that would help your case. I'd also like one of them to double check my conclusion, but if it's correct, we can't land this PR as is.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take a closer look at this PR

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dayanandav are you proposing to land the PR or just show it as a repro? I tested the PR, it redistributes local_tensors to achieve view(-1): Shard(1) -> view(-1) -> Replicate()

To align with torch.Tensor.view, we don't allow resitribution/communication in DTensor.view

If you really need redistribution, could you call redistribute explicitly to convert from Shard(1) to Replicate(), instead of letting view do it?

Do you have a follow up op that "reverts" the view and make it leggal? Take batch dim flatenning/unflattening as an example, (2, 3, 4) -> view -> (2 x 3, 4) -> view -> (2, 3, 4) is the case I am trying to support

Conflict changes removed
Copy link
Copy Markdown
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making sure this is blocked

Lint issue fixes
@dayanandav dayanandav requested a review from ezyang September 21, 2025 05:54
@dayanandav
Copy link
Copy Markdown
Contributor Author

@weifengpy @ezyang progress on review ??

weifengpy added a commit that referenced this pull request Oct 3, 2025
…Tensor)"

nn.Linear(DTensor) got decomposed into view on DTensor, with error ` RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')`

still learning from a few PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 3, 2025
…from nn.Linear(DTensor)"

nn.Linear(DTensor) got decomposed into view on DTensor, with error ` RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')`

still learning from a few PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 3, 2025
…Tensor)"

nn.Linear(DTensor) got decomposed into view on DTensor, with error ` RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')`

still learning from a few PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 3, 2025
…from nn.Linear(DTensor)"

nn.Linear(DTensor) got decomposed into view on DTensor, with error ` RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')`

still learning from a few PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 8, 2025
…attening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
Fix lint issue
weifengpy added a commit that referenced this pull request Oct 8, 2025
…ttening and unflattening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 8, 2025
…attening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 8, 2025
…ttening and unflattening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 8, 2025
…attening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 8, 2025
…ttening and unflattening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 8, 2025
…attening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 9, 2025
…ttening and unflattening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 9, 2025
…attening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 9, 2025
…ttening and unflattening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 9, 2025
…attening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 9, 2025
…ttening and unflattening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 9, 2025
…attening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 10, 2025
…ttening and unflattening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 10, 2025
…attening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 13, 2025
…ttening and unflattening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 13, 2025
…attening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 13, 2025
…ttening and unflattening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 13, 2025
…attening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 13, 2025
…ttening and unflattening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 13, 2025
…attening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 13, 2025
…ttening and unflattening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 13, 2025
…attening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 



cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 16, 2025
…ning/unflattening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 


cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 16, 2025
…ning/unflattening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 


cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 16, 2025
…for batch flattening/unflattening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 


cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
weifengpy added a commit that referenced this pull request Oct 16, 2025
…ning/unflattening"

for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)

when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors

```
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]:     in_dim = get_in_dim_to_shard(cmd)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```

this PR add hierachical placements to support batch flattening and unflattening

reference DTensor view PRs
* #149764 
* #161950 
* #161161 


cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 7, 2025

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: dtensor distributed tensor tag oncall: distributed Add this issue/PR to distributed oncall triage queue open source Stale topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RuntimeError in DTensor evenly shard view op

7 participants