[DTensor] support _StridedShard in view op#159656
[DTensor] support _StridedShard in view op#159656XilunWu wants to merge 2 commits intogh/XilunWu/162/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159656
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Cancelled Job, 1 Unrelated FailureAs of commit f4bc2d3 with merge base ba37f58 ( CANCELLED JOB - The following job was cancelled. Please retry:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
wconstab
left a comment
There was a problem hiding this comment.
i think if you can explain through what are the requirements for the new StridedShard to stll be valid and convince me you're checking all those requirements, then we can land this (but need at test too)
**Summary** Some thoughts on view-op and `_StridedShard` interaction: 1. `_StridedShard` has no impact on sharding (i.e. how tensor is partitioned) compared to `Shard`. It only changes how shards permute across the devices. 2. `view()` op on DTensor strictly forbids shard redistribution which means if `view()` may cause shard permutation across devices, it should be rejected. This is enforced in today's sharding prop for `view()`. 3. Since DTensor `view()` won't introduce any redistribution, it's certain that `placements` won't change except the inner `dim` attribute of `Shard` or `_StridedShard`. Therefore, to support `_StridedShard` in `view()` op, the only change required is to keep `_StridedShard` as `_StridedShard` in the output spec. **Test** `` cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta tianyu-l [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 2, 2, linux.rocm.gpu.2) Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge -f "canceld rocm job" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
**Summary** Some thoughts on view-op and `_StridedShard` interaction: 1. `_StridedShard` has no impact on sharding (i.e. how tensor is partitioned) compared to `Shard`. It only changes how shards permute across the devices. 2. `view()` op on DTensor strictly forbids shard redistribution which means if `view()` may cause shard permutation across devices, it should be rejected. This is enforced in today's sharding prop for `view()`. 3. Since DTensor `view()` won't introduce any redistribution, it's certain that `placements` won't change except the inner `dim` attribute of `Shard` or `_StridedShard`. Therefore, to support `_StridedShard` in `view()` op, the only change required is to keep `_StridedShard` as `_StridedShard` in the output spec. **Test** `pytest test/distributed/tensor/test_view_ops.py` Pull Request resolved: pytorch#159656 Approved by: https://github.com/wconstab
|
Directly propagate StridedShard may not be always TRUE. Think of the following case: Then B.placements is expected to be [Shard(3), Shard(2), Replicate(), Replicate()]. No split_factor any more. |
Stack from ghstack (oldest at bottom):
Summary
Some thoughts on view-op and
_StridedShardinteraction:_StridedShardhas no impact on sharding (i.e. how tensor is partitioned)compared to
Shard. It only changes how shards permute across the devices.view()op on DTensor strictly forbids shard redistribution which means ifview()may cause shard permutation across devices, it should be rejected.This is enforced in today's sharding prop for
view().view()won't introduce any redistribution, it's certain thatplacementswon't change except the innerdimattribute ofShardor
_StridedShard.Therefore, to support
_StridedShardinview()op, the only change requiredis to keep
_StridedShardas_StridedShardin the output spec.Test
pytest test/distributed/tensor/test_view_ops.pycc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @tianyu-l