reshape or view ops validate evenly and unevenly sharded dtensor#161161
reshape or view ops validate evenly and unevenly sharded dtensor#161161dayanandav wants to merge 9 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161161
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 482a007 with merge base 31d5c67 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "topic: not user facing" "module: dtensor" |
For view/reshape ops validate evenly or unevenly sharded dtensor before getting to runtime dispatch, thrown more specific error before getting to runtime dispatch as implemented here pytorch#149764
can_shard_dim was disable cause exception for replicate placement
resolve conflicts
conflict resolved
| def test_illegal_views(self): | ||
| device_mesh = self.build_device_mesh() | ||
| # 1D mesh [6] (see above) | ||
| tensor = torch.randn((6, 252)) |
There was a problem hiding this comment.
Let's use a tensor of 2,4 size for example, values from range(8).
Initially shard(1)
Rank0
((0,1),(4,5))
Rank1
((2,3),(6,7))
Flattened tensor then shard(0)
Rank0
(0,1,2,3)
Rank1
(4,5,6,7)
Since the local values per rank changed, the view is not valid. Do you agree?
There was a problem hiding this comment.
yes, i agree with even values can be shard across dim 0/1 after view.
There was a problem hiding this comment.
I don't think I explained it well- my example was supposed to show that this PR is incorrect and this is not a valid view.
It would only be valid if the local tensors did not change. Since they do change it requires a redistribution, which we don't allow for a view op.
There was a problem hiding this comment.
@wconstab
am facing problem "Attempted to flatten sharded dimension 1" error with the below one of my view operation for my backend, i hope below redistribution is valid and this is not handle properly with current design #161395 patch, so i raised this PR and reported #161147 with simplified steps to address this problem.
`
new_mesh = init_device_mesh("cuda", [4], mesh_dim_names=["tp"])
x=torch.randn((8, 4, 8, 4),dtype=torch.float32,device="cuda")
d_x = DTensor.from_local(x, device_mesh=new_mesh, placements=[Replicate()])
d_x = d_x.redistribute(device_mesh=new_mesh, placements=[Shard(dim=1)])
d_x = d_x.reshape(1024) ##8x4x8x4=1024
`
There was a problem hiding this comment.
@wconstab do you suggest any alternative solution to deal with above problem under view operation?
There was a problem hiding this comment.
@weifengpy @XilunWu @ezyang have been looking into better view support in general. I'm not sure if they have an update that would help your case. I'd also like one of them to double check my conclusion, but if it's correct, we can't land this PR as is.
There was a problem hiding this comment.
I can take a closer look at this PR
There was a problem hiding this comment.
@dayanandav are you proposing to land the PR or just show it as a repro? I tested the PR, it redistributes local_tensors to achieve view(-1): Shard(1) -> view(-1) -> Replicate()
To align with torch.Tensor.view, we don't allow resitribution/communication in DTensor.view
If you really need redistribution, could you call redistribute explicitly to convert from Shard(1) to Replicate(), instead of letting view do it?
Do you have a follow up op that "reverts" the view and make it leggal? Take batch dim flatenning/unflattening as an example, (2, 3, 4) -> view -> (2 x 3, 4) -> view -> (2, 3, 4) is the case I am trying to support
Conflict changes removed
ezyang
left a comment
There was a problem hiding this comment.
making sure this is blocked
Lint issue fixes
|
@weifengpy @ezyang progress on review ?? |
…Tensor)"
nn.Linear(DTensor) got decomposed into view on DTensor, with error ` RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')`
still learning from a few PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…from nn.Linear(DTensor)"
nn.Linear(DTensor) got decomposed into view on DTensor, with error ` RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')`
still learning from a few PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…Tensor)"
nn.Linear(DTensor) got decomposed into view on DTensor, with error ` RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')`
still learning from a few PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…from nn.Linear(DTensor)"
nn.Linear(DTensor) got decomposed into view on DTensor, with error ` RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')`
still learning from a few PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…attening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
Fix lint issue
…ttening and unflattening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…attening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…ttening and unflattening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…attening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…ttening and unflattening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…attening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…ttening and unflattening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…attening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…ttening and unflattening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…attening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…ttening and unflattening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…attening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…ttening and unflattening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…attening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…ttening and unflattening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…attening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…ttening and unflattening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…attening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…ttening and unflattening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…attening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…ttening and unflattening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…attening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…ning/unflattening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…ning/unflattening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…for batch flattening/unflattening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
…ning/unflattening"
for `F.linear(inputs, weight)`, we have batch flattening and unflattening
* batch flattening: (bsz, seq_len, dim) -> aten.view.default -> (bsz x seq_len, dim)
* batch unflattening: (bsz x seq_len, dim) -> aten._unsafe_view.default -> (bsz, seq_len, arbitrary_out_dim)
when `inputs` is DTensor `(Shard(1), )`, `view(batch_size * seq_len, input_dim)` errors
```
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 618, in propagate_shape_and_sharding
[rank1]: in_dim = get_in_dim_to_shard(cmd)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/users/weif/pytorch/torch/distributed/tensor/_ops/_view_ops.py", line 548, in get_in_dim_to_shard
[rank1]: raise RuntimeError(
[rank1]: RuntimeError: ('Attempted to flatten multiple dimensions, with dimension 1 being sharded. ', 'It cannot be performed without redistribution, which is disallowed by the current operator.')
```
this PR add hierachical placements to support batch flattening and unflattening
reference DTensor view PRs
* #149764
* #161950
* #161161
cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci
[ghstack-poisoned]
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
For view/reshape ops validate evenly or unevenly sharded dtensor before getting to runtime dispatch, thrown more specific error before getting to runtime dispatch as implemented here #149764
Fixes #161147
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @tianyu-l @XilunWu @SherlockNoMad