[DTensor] support _StridedShard in view op by XilunWu · Pull Request #159656 · pytorch/pytorch

XilunWu · 2025-08-01T17:24:51Z

Stack from ghstack (oldest at bottom):

-> [DTensor] support _StridedShard in view op #159656

Summary
Some thoughts on view-op and _StridedShard interaction:

_StridedShard has no impact on sharding (i.e. how tensor is partitioned)
compared to Shard. It only changes how shards permute across the devices.
view() op on DTensor strictly forbids shard redistribution which means if
view() may cause shard permutation across devices, it should be rejected.
This is enforced in today's sharding prop for view().
Since DTensor view() won't introduce any redistribution, it's certain that
placements won't change except the inner dim attribute of Shard
or _StridedShard.

Therefore, to support _StridedShard in view() op, the only change required
is to keep _StridedShard as _StridedShard in the output spec.

Test
pytest test/distributed/tensor/test_view_ops.py

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @tianyu-l

[ghstack-poisoned]

ghstack-source-id: 247789b Pull Request resolved: #159656

pytorch-bot · 2025-08-01T17:24:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159656

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Cancelled Job, 1 Unrelated Failure

As of commit f4bc2d3 with merge base ba37f58 ():

CANCELLED JOB - The following job was cancelled. Please retry:

trunk / linux-jammy-rocm-py3.10 / test (default, 2, 2, linux.rocm.gpu.2) (gh)
##[error]The operation was canceled.

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge, unstable) (gh) (#158876)
/var/lib/jenkins/workspace/xla/torch_xla/csrc/runtime/BUILD:476:14: Compiling torch_xla/csrc/runtime/xla_util_test.cpp failed: (Exit 1): gcc failed: error executing CppCompile command (from target //torch_xla/csrc/runtime:xla_util_test) /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 229 arguments skipped)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/distributed/tensor/_ops/_view_ops.py

wconstab

i think if you can explain through what are the requirements for the new StridedShard to stll be valid and convince me you're checking all those requirements, then we can land this (but need at test too)

**Summary** Some thoughts on view-op and `_StridedShard` interaction: 1. `_StridedShard` has no impact on sharding (i.e. how tensor is partitioned) compared to `Shard`. It only changes how shards permute across the devices. 2. `view()` op on DTensor strictly forbids shard redistribution which means if `view()` may cause shard permutation across devices, it should be rejected. This is enforced in today's sharding prop for `view()`. 3. Since DTensor `view()` won't introduce any redistribution, it's certain that `placements` won't change except the inner `dim` attribute of `Shard` or `_StridedShard`. Therefore, to support `_StridedShard` in `view()` op, the only change required is to keep `_StridedShard` as `_StridedShard` in the output spec. **Test** `` cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta tianyu-l [ghstack-poisoned]

ghstack-source-id: 1095f42 Pull Request resolved: #159656

wconstab

Lgtm

XilunWu · 2025-08-07T06:08:48Z

@pytorchbot merge

pytorchmergebot · 2025-08-07T06:11:23Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-08-07T10:40:15Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 2, 2, linux.rocm.gpu.2)

Details for Dev Infra team

Raised by workflow job

XilunWu · 2025-08-07T15:57:33Z

@pytorchbot merge -f "canceld rocm job"

pytorchmergebot · 2025-08-07T15:59:13Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

**Summary** Some thoughts on view-op and `_StridedShard` interaction: 1. `_StridedShard` has no impact on sharding (i.e. how tensor is partitioned) compared to `Shard`. It only changes how shards permute across the devices. 2. `view()` op on DTensor strictly forbids shard redistribution which means if `view()` may cause shard permutation across devices, it should be rejected. This is enforced in today's sharding prop for `view()`. 3. Since DTensor `view()` won't introduce any redistribution, it's certain that `placements` won't change except the inner `dim` attribute of `Shard` or `_StridedShard`. Therefore, to support `_StridedShard` in `view()` op, the only change required is to keep `_StridedShard` as `_StridedShard` in the output spec. **Test** `pytest test/distributed/tensor/test_view_ops.py` Pull Request resolved: pytorch#159656 Approved by: https://github.com/wconstab

zpcore · 2025-12-03T21:16:49Z

Directly propagate StridedShard may not be always TRUE. Think of the following case:

mesh = init_device_mesh("cpu", (2, 2, 2, 2))
input_tensor = torch.randn(4, 4, 8)
A = distribute_tensor(input_tensor, mesh, [_StridedShard(2, split_factor=2), Shard(2), Replicate(), Replicate()])
B = A.view(4, 4, 2, 4)

Then B.placements is expected to be [Shard(3), Shard(2), Replicate(), Replicate()]. No split_factor any more.

[DTensor] draft: support _StridedShard in view op

233e689

[ghstack-poisoned]

XilunWu added a commit that referenced this pull request Aug 1, 2025

[DTensor] draft: support _StridedShard in view op

e3542c1

ghstack-source-id: 247789b Pull Request resolved: #159656

pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Aug 1, 2025

XilunWu marked this pull request as draft August 1, 2025 17:27

XilunWu requested a review from wconstab August 1, 2025 17:27

wconstab reviewed Aug 1, 2025

View reviewed changes

torch/distributed/tensor/_ops/_view_ops.py Show resolved Hide resolved

wconstab reviewed Aug 1, 2025

View reviewed changes

XilunWu added topic: not user facing topic category module: dtensor distributed tensor tag labels Aug 6, 2025

XilunWu added a commit that referenced this pull request Aug 7, 2025

[DTensor] draft: support _StridedShard in view op

28eec55

ghstack-source-id: 1095f42 Pull Request resolved: #159656

XilunWu changed the title ~~[DTensor] draft: support _StridedShard in view op~~ [DTensor] support _StridedShard in view op Aug 7, 2025

XilunWu requested a review from wconstab August 7, 2025 00:55

XilunWu marked this pull request as ready for review August 7, 2025 00:55

XilunWu requested review from tianyu-l, wanchaol and zpcore August 7, 2025 00:55

wconstab approved these changes Aug 7, 2025

View reviewed changes

XilunWu added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 7, 2025

pytorchmergebot added the merging label Aug 7, 2025

pytorchmergebot removed the merging label Aug 7, 2025

pytorchmergebot added the merging label Aug 7, 2025

pytorchmergebot closed this in e248719 Aug 7, 2025

pytorchmergebot added Merged and removed merging labels Aug 7, 2025

wesleytruong mentioned this pull request Aug 11, 2025

minimal repro for fsdp + tp incorrect permutation pytorch/torchtitan#1483

Closed

github-actions bot deleted the gh/XilunWu/162/head branch September 7, 2025 02:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DTensor] support _StridedShard in view op#159656

[DTensor] support _StridedShard in view op#159656
XilunWu wants to merge 2 commits intogh/XilunWu/162/basefrom
gh/XilunWu/162/head

XilunWu commented Aug 1, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

wconstab left a comment

Uh oh!

wconstab left a comment

Uh oh!

XilunWu commented Aug 7, 2025

Uh oh!

pytorchmergebot commented Aug 7, 2025

Uh oh!

pytorchmergebot commented Aug 7, 2025

Uh oh!

XilunWu commented Aug 7, 2025

Uh oh!

pytorchmergebot commented Aug 7, 2025

Uh oh!

zpcore commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

XilunWu commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159656

❌ 1 Cancelled Job, 1 Unrelated Failure

Uh oh!

Uh oh!

wconstab left a comment

Choose a reason for hiding this comment

Uh oh!

wconstab left a comment

Choose a reason for hiding this comment

Uh oh!

XilunWu commented Aug 7, 2025

Uh oh!

pytorchmergebot commented Aug 7, 2025

Merge started

Uh oh!

pytorchmergebot commented Aug 7, 2025

Merge failed

Uh oh!

XilunWu commented Aug 7, 2025

Uh oh!

pytorchmergebot commented Aug 7, 2025

Merge started

Uh oh!

zpcore commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

XilunWu commented Aug 1, 2025 •

edited

Loading

pytorch-bot bot commented Aug 1, 2025 •

edited

Loading