Skip to content

[DTensor] Prevent squeeze from redistributing with strict_view#175798

Closed
stmcgovern wants to merge 3 commits intopytorch:mainfrom
stmcgovern:fix-squeeze-inplace-local-tensor
Closed

[DTensor] Prevent squeeze from redistributing with strict_view#175798
stmcgovern wants to merge 3 commits intopytorch:mainfrom
stmcgovern:fix-squeeze-inplace-local-tensor

Conversation

@stmcgovern
Copy link
Copy Markdown
Collaborator

@stmcgovern stmcgovern commented Feb 25, 2026

Fixes #174136
When squeeze_.dim triggers redistribution (e.g. squeezing a globally-singleton sharded dim forces Shard(0) → Replicate), redistribute_local_args creates a new local tensor. The dispatch path updated _spec but never assigned the new tensor back to _local_tensor, leaving the DTensor in an inconsistent state where the spec claims shape [4] but the local tensor is still [1, 4].

Update: Prevent squeeze, as a view op, from doing a redistribution using the view_strict keyword.

This PR syncs _spec and _local_tensor. It uses the inplace_view tag, so other inplace view ops (unsqueeze_, t_, transpose_) won't hit this, when strategies are registered for them.

The C++ fast path in dispatchDTensorOp explicitly skips inplace ops, so no C++ change is needed.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Feb 25, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175798

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ce19418 with merge base 9daaaa2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot Bot added the release notes: distributed (dtensor) release notes category label Feb 25, 2026
@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 5, 2026
@stmcgovern stmcgovern force-pushed the fix-squeeze-inplace-local-tensor branch 2 times, most recently from e2dfacb to dd7b23b Compare March 10, 2026 14:30
@wconstab
Copy link
Copy Markdown
Contributor

@claude review this PR

@claude
Copy link
Copy Markdown

claude Bot commented Mar 10, 2026

Claude finished @wconstab's task in 4m 38s —— View job


PR Review: #175798

Summary

This PR fixes a real bug where squeeze_.dim (and squeeze_.dims) could leave a DTensor in an inconsistent state: _spec would be updated after redistribution but _local_tensor would still hold the pre-redistribution tensor. The fix generalizes the special case from squeeze_.dim-only to all inplace_view-tagged ops, which is the right approach.

Code Quality

The change is well-scoped and the generalization from a single op check (op_call == aten.squeeze_.dim) to a tag-based check (torch.Tag.inplace_view in op_call.tags) is a clear improvement. The comments are concise and useful.

One observation worth discussing:

  • torch/distributed/tensor/_dispatch.py:485-486 — The assertion isinstance(local_results, torch.Tensor) is good defensive code, but consider whether local_results could ever be None here. When participating is False, local_results remains None (set at _dispatch.py:286) and flows into _dispatch_fast_path_python_tail. However, this code path requires output_sharding.output_spec is not None (line 472) and is_inplace_op (line 470), and for a non-participating rank the local_results would be set to an empty tensor at line 418-420 (since output_spec is not None). So the assertion should hold. This is fine as-is.

  • Comment nit (_dispatch.py:480-483): The comment says "redistribution may produce a new local tensor" — this is accurate but could be slightly more precise. The key insight is that redistribute_local_args modifies op_info.local_args (creating a new tensor), and then the local op executes on that new tensor, so local_results points to a different tensor than args[0]._local_tensor. This is minor and not blocking.

Testing

  • Tests exist for the fix (test_squeeze_inplace_local_tensor)
  • Tests cover the redistribution case (squeeze singleton sharded dim) and the no-op case (squeeze non-singleton dim)
  • Tests verify both shape and _local_tensor.shape consistency, plus correctness via full_tensor()

The tests are well-structured and directly target the bug. Two suggestions:

  1. squeeze_.dims coverage: The dispatch fix now covers all inplace_view ops, and squeeze_.dims is a registered strategy (_view_ops.py:813-815). Consider adding a test case for squeeze_.dims since it can trigger the same redistribution path. For example: dt.squeeze_((0,)) on the same [1, 4] sharded tensor.

  2. Future inplace_view ops: The PR description notes that unsqueeze_, t_, transpose_ don't have registered strategies yet, so they won't hit this path. When strategies are added for those ops, they'll need tests too, but that's future work and out of scope here.

Security

No security concerns identified.

Thread Safety

No thread safety concerns — this code path is per-op dispatch, not shared mutable state.

Backward Compatibility

No BC-breaking changes. This is purely a bug fix:

  • Before: squeeze_ with redistribution left DTensor in an inconsistent state (bug)
  • After: squeeze_ with redistribution correctly syncs both _spec and _local_tensor

The generalization from aten.squeeze_.dim to torch.Tag.inplace_view is forward-compatible — any future inplace view op that gets a strategy will automatically benefit from this fix without additional dispatch changes.

Performance

No performance concerns. The torch.Tag.inplace_view in op_call.tags check is a set membership test on a small set and replaces an equality check — negligible difference. The additional _local_tensor assignment is a simple attribute write.

Recommendation

Approve

This is a correct, well-targeted bug fix with good test coverage. The generalization from a single-op special case to tag-based dispatch is the right design choice. The only minor suggestion is adding a squeeze_.dims test case for completeness, but it's not blocking.


Comment thread torch/distributed/tensor/_dispatch.py Outdated
# use return_and_correct_aliasing to match the outer and the inner
# aliasing. See https://github.com/pytorch/pytorch/pull/158954
if not isinstance(local_results, torch.Tensor):
raise AssertionError
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add text about what type the local result is

Comment thread torch/distributed/tensor/_dispatch.py Outdated
# update the spec to handle tensor meta changes
if torch.Tag.inplace_view in op_call.tags:
# Inplace view ops may change tensor meta. Update both
# spec and local tensor -- redistribution may produce a
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment says 'redistribution may produce..' but i think in this path a redistribution is actually not happening- its purely a metadata change right?

dt = distribute_tensor(x, mesh, [Shard(0)])
dt.squeeze_(0)
self.assertEqual(dt.shape, torch.Size([4]))
self.assertEqual(dt._local_tensor.shape, torch.Size([4]))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now i'm wondering- is this squeeze actually doing a redistribution? or just changing placements (aka view operation)? lets add comm counter to disambiguate this in the test. (and update the comment above if needed)

stmcgovern added a commit to stmcgovern/pytorch that referenced this pull request Mar 11, 2026
Respond to wconstab's review comments on PR pytorch#175798:

- Clarify comment about when/why local tensor writeback is needed
- Add error message to AssertionError for debuggability
- Add CommDebugMode assertions to verify allgather happens for
  Shard->Replicate squeeze and zero comms for no-op squeeze
@stmcgovern stmcgovern force-pushed the fix-squeeze-inplace-local-tensor branch from dd7b23b to 071ed37 Compare March 11, 2026 14:18
stmcgovern added a commit to stmcgovern/pytorch that referenced this pull request Mar 11, 2026
Respond to wconstab's review comments on PR pytorch#175798:

- Clarify comment about when/why local tensor writeback is needed
- Add error message to AssertionError for debuggability
- Add CommDebugMode assertions to verify allgather happens for
  Shard->Replicate squeeze and zero comms for no-op squeeze
@stmcgovern stmcgovern force-pushed the fix-squeeze-inplace-local-tensor branch from 071ed37 to 7ac3e4c Compare March 11, 2026 14:36
@stmcgovern stmcgovern requested a review from wconstab March 11, 2026 15:10
stmcgovern added a commit to stmcgovern/pytorch that referenced this pull request Mar 17, 2026
Respond to wconstab's review comments on PR pytorch#175798:

- Clarify comment about when/why local tensor writeback is needed
- Add error message to AssertionError for debuggability
- Add CommDebugMode assertions to verify allgather happens for
  Shard->Replicate squeeze and zero comms for no-op squeeze
@stmcgovern stmcgovern force-pushed the fix-squeeze-inplace-local-tensor branch 2 times, most recently from d88c851 to 67adb02 Compare March 17, 2026 18:39
@stmcgovern
Copy link
Copy Markdown
Collaborator Author

reordered the checks to address the benchmark regression CI failure.

@stmcgovern
Copy link
Copy Markdown
Collaborator Author

@wconstab this one is ready for review when you have the chance

dt = distribute_tensor(x, mesh, [Shard(0)])
comm_mode = CommDebugMode()
with comm_mode:
dt.squeeze_(0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to my original comment, it seems like this PR is going against the design for dtensor views. Views are supposed to be gauranteed fast and sharing same memory as the original tensor, so doing a redistribution implicitly during a view operation is semantics breaking even if it's convenient in some cases. We prefer to error as unsupported in such cases and require the user intervene to do a redistribution first manually and then perform a true view operation

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Yes this PR is cleaning up after a redistribution. But that redistribution should be banned. I'll use strict_view=True for squeeze, following #149764

@stmcgovern stmcgovern force-pushed the fix-squeeze-inplace-local-tensor branch from 856a96d to 85dc8d8 Compare March 27, 2026 20:30
@pytorch-bot pytorch-bot Bot added ciflow/dtensor Run DTensor specific tests ciflow/inductor ciflow/torchtitan Run TorchTitan integration tests labels Mar 27, 2026
@stmcgovern stmcgovern force-pushed the fix-squeeze-inplace-local-tensor branch from 0885f50 to 364f8d0 Compare March 30, 2026 02:47
Comment thread torch/distributed/tensor/_dispatch.py Outdated
@zpcore zpcore self-requested a review April 2, 2026 18:26
@aditvenk aditvenk requested a review from pianpwk April 6, 2026 17:00
@stmcgovern stmcgovern force-pushed the fix-squeeze-inplace-local-tensor branch from 12736e2 to ff4769b Compare April 10, 2026 15:28
@stmcgovern stmcgovern force-pushed the fix-squeeze-inplace-local-tensor branch from ff4769b to 8582689 Compare April 13, 2026 15:45
@stmcgovern stmcgovern requested a review from weifengpy as a code owner April 13, 2026 15:45
@stmcgovern stmcgovern force-pushed the fix-squeeze-inplace-local-tensor branch from 8582689 to 9b58744 Compare April 14, 2026 13:28
@stmcgovern
Copy link
Copy Markdown
Collaborator Author

these failures seem unrelated. going to try to merge

@stmcgovern
Copy link
Copy Markdown
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 14, 2026
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 2 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@stmcgovern stmcgovern changed the title [DTensor] Sync local tensor for inplace view ops after redistribution [DTensor] Prevent squeeze from redistributing with strict_view Apr 14, 2026
@pytorch-bot pytorch-bot Bot added ciflow/dtensor Run DTensor specific tests ciflow/inductor ciflow/torchtitan Run TorchTitan integration tests labels Apr 14, 2026
… allgather

Squeeze ops are views and must be communication-free. Enable
strict_view=True for all squeeze registrations so squeezing a sharded
singleton dim raises instead of silently allgathering.

Generalize the inplace dispatch handler to check the redistribute
schema for actual placement changes rather than special-casing
squeeze_ ops by name.

Fixes pytorch#174136
Use placement tuple comparison instead of DTensorSpec.__eq__ on the
inplace fast path. DTensorSpec.__eq__ compares mesh, placements,
shard_order, shape, stride, and dtype — unnecessary for the fast-path
decision where only placements matter. This restores the instruction
count to match main (56530 expected vs 66916 with __eq__).
…guard

Remove the needs_redistribute check for inplace ops in dispatch.
With strict_view=True on squeeze ops, sharding propagation already
errors when an inplace view would require redistribution of arg 0.

The remaining code handles two cases:
1. Placements unchanged (add_, mul_, etc.) — return self
2. Placements reindexed (squeeze_ removing non-sharded dim:
   Shard(1) → Shard(0)) — update spec, no communication
@stmcgovern stmcgovern force-pushed the fix-squeeze-inplace-local-tensor branch from 9b58744 to ce19418 Compare April 21, 2026 01:18
@stmcgovern
Copy link
Copy Markdown
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/dtensor Run DTensor specific tests ciflow/inductor ciflow/torchtitan Run TorchTitan integration tests ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: distributed (dtensor) release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DTensor] squeeze_.dim updates spec but doesn't squeeze local tensor

6 participants