[DTensor] Fix squeeze() removing non-singleton sharded dimensions. by mansiag05 · Pull Request #166862 · pytorch/pytorch

mansiag05 · 2025-11-03T15:30:37Z

Fix bug where squeeze() incorrectly removes dimensions that are locally singleton (size=1) but globally not (size=mesh_size). Added custom handler to check global shape before squeezing local tensor.

Fixes #166124

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk

pytorch-bot · 2025-11-03T15:30:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166862

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit 105160b with merge base 0cd681d ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner-noclang-partial / linux-job (gh)
>>> Lint for torch/distributed/tensor/_dispatch.py:
Lint / lintrunner-pyrefly-partial / linux-job (gh)
>>> Lint for torch/distributed/tensor/_dispatch.py:
pull / linux-jammy-py3.10-gcc11 / test (distributed, 1, 2, linux.2xlarge) (gh)
test/distributed/tensor/test_view_ops.py::TestViewOps::test_squeeze_
pull / linux-jammy-py3.10-gcc11 / test (distributed, 2, 2, linux.2xlarge) (gh)
test/distributed/tensor/test_dtensor_ops.py::TestMultiThreadedDTensorOpsCPU::test_dtensor_op_db_nn_functional_linear_cpu_float32

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mansiag05 · 2025-11-03T15:36:24Z

@pytorchbot label "topic: not user facing"

mansiag05 · 2025-11-03T16:15:39Z

cc @stmcgovern @skpark-rh

skpark-rh · 2025-11-04T21:25:19Z

I don't think there is a bug in the squeeze. I actually think it is working as intended. I do think there is an issue with the unsqueeze logic. If you squeeze the dimensions, it should modify the tensor and remove all 1 in the all shapes. When I do an unsqueeze(0), I do see that the full tensor becomes (1, 32) instead of (1, 4, 8). My intuition tells me that either there is a misunderstanding about the API or it is not working as intended. I am leaning on the first.

skpark-rh · 2025-11-05T20:06:12Z

At the very least I think the test for squeeze that is skipped needs to be removed and utilized for this fix. (test/distributed/tensor/test_dtensor_ops.py:507)

mansiag05 · 2025-11-11T13:54:02Z

Thanks for taking a look at this @skpark-rh. I totally get where you're coming from, the global shape reporting does look correct at first glance. Let me show you what's actually going wrong under the hood.

The dtensor output shape correctly reports [4, 8] after squeeze. But the sneaky part is that the local tensors on each rank are getting incorrectly squeezed, and that breaks everything downstream.
The problem is that each rank's local tensor went from [1, 8] → [8], but the DTensor still thinks it's managing a [4, 8] tensor with Shard(0). When you try to gather them back with full_tensor(), it crashes because the shapes don't line up.

Also, There's actually a FIXME comment in the code at /torch/distributed/tensor/_ops/_view_ops.py:431-440 that describes this exact issue. So this was a known issue that just hadn't been fixed yet!

And Great catch on line 507! You're absolutely right - there's a skipped test. Looks like it was skipped because of this bug! The OpInfo tests were failing because full_tensor() would crash after squeeze.
Since the PR fixes it so we can finally enable the skipped tests

Does this make sense? Happy to clarify anything! 😊

skpark-rh · 2025-11-11T14:05:37Z

I see. So when the global tensor is [4, 8, 1], the dtensor when sharding(0) would create local tensors of [1, 8, 1]. The squeeze should only remove the 3rd dim and keep the first dim.

github-actions · 2026-01-10T14:37:10Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

wconstab

i think this looks correct. if we can not find a less nuclear way to support squeeze than using a custom handler, i'll spend more time looking at whether we covered all the cases in this handler. But i wanted to let others give suggestions.

one thought, which might not be a good idea, is to make a more minimal special case for sqeeze in the dispatch path that would just mutate the args/kwargs (noop when 'dim' arg is present, but computes a new 'dim' arg based on global singleton dims when dim is not present) and then do the rest of dispatch the normal way. it might be strictly better to just use the whole override approach as in this PR.

tianyu-l

Instead of adding a special handler in dispatch.py, I wonder if it's better to adjust the arg in sharding_prop.py.

We have a few ops (view, new_empty, etc.) for which we have to adjust the shape, and still use the per-op strategies.
https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/_sharding_prop.py#L155
Here it's similar that

we can keep the strategy
and only modifies non-Tensor arg (the dim)

…torch#166124) Fix bug where squeeze() incorrectly removes dimensions that are locally singleton (size=1) but globally not (size=mesh_size). Added custom handler to check global shape before squeezing local tensor.

mansiag05 · 2026-01-27T08:53:54Z

Hello @tianyu-l,

I looked into the op_to_shape_and_stride_idx, however, I'm not sure how to apply that pattern here. For view and new_empty, we modify the shape arg but keep the same op. For squeeze, the challenge is:

squeeze.default has no dim arg - it squeezes all local singleton dims
To squeeze only globally singleton dims, we need to use squeeze.dims with explicit dims

This means we need to change the op variant, not just modify the arg. The current redistribute_schema pattern only modifies args while op_call stays the same.

Could you help me understand how this can be approached?

Extend `dim_squeeze` to handle multiple dimensions by normalizing all dim variants to a target dimension set. This unifies the logic into a single code path. - Extend `dim_squeeze()` type signature to `DimsType | None` - Normalize all cases to `target_dims: set[int]` - Single return path: keep dims that are size > 1 or not targeted - Register `aten.squeeze.dims` using existing torch.squeeze mapping - Add test_squeeze_variants to test all squeeze variants with DTensor Note: op_db test remains xfail due to pre-existing bug where local squeeze removes sharded dims with local size 1 (see PR pytorch#166862). Fixes pytorch#173521

tianyu-l · 2026-02-02T08:42:00Z

This means we need to change the op variant, not just modify the arg. The current redistribute_schema pattern only modifies args while op_call stays the same.

What I meant was not to reuse code for op_to_shape_and_stride_idx; instead we could invent new functions in sharding prop to achieve what squeeze handler does.

Given your squeeze_handler is for squeeze ops only for now, and the handler complexity is limited, I think the PR is acceptable. cc @pianpwk to review too

wconstab · 2026-02-02T16:21:27Z

@stmcgovern you have another PR for squeeze.dims - do you mind aligning with @mansiag05 on an overall approach?

I'm ok with this PR- let's get it cleaned up and land-ready, then i'll review

stmcgovern · 2026-02-02T18:38:19Z

@wconstab Thanks. I opened #173563 thinking that it could be orthogonal to this PR. Looking at this a bit more and following the interesting discussion here, I do think that we can avoid adding a custom handler here if we rewrite the squeeze to squeeze.dims in _sharding_prop.py. I'll investigate a bit more and coordinate with @mansiag05 .

pianpwk · 2026-02-02T21:17:52Z

+        dim_normalized = dim_arg if dim_arg >= 0 else dim_arg + len(global_shape)
+        singleton_dims = (dim_normalized,) if global_shape[dim_normalized] == 1 else ()
+    else:
+        singleton_dims = tuple(i for i, size in enumerate(global_shape) if size == 1)


just wondering: is it possible to construct a test case where mesh_dim_size > 1, but tensor_dim_size < mesh_dim_size? e.g. shard size 4 on mesh dim with 8 ranks.

I'm wondering if some ranks would see size [0, ...], and this squeeze logic would not work.

Extend `dim_squeeze` to handle multiple dimensions by normalizing all dim variants to a target dimension set. This unifies the logic into a single code path. - Extend `dim_squeeze()` type signature to `DimsType | None` - Normalize all cases to `target_dims: set[int]` - Single return path: keep dims that are size > 1 or not targeted - Register `aten.squeeze.dims` using existing torch.squeeze mapping - Add test_squeeze_variants to test all squeeze variants with DTensor Note: op_db test remains xfail due to pre-existing bug where local squeeze removes sharded dims with local size 1 (see PR pytorch#166862). Fixes pytorch#173521

stmcgovern · 2026-02-03T21:15:11Z

I updated #173563 to include the fix for the local/global singleton mismatch FIXME (superseding this PR) . It leverages the squeeze.dims strategy support to turn all squeeze op variants into squeeze.dims in _sharding_prop.py and then handle the dims in one place. It avoids the custom handler, but touches _dispatch.py @wconstab @tianyu-l @pianpwk @mansiag05

wconstab · 2026-02-06T21:02:01Z

iiuc this PR is no longer needed in favor of #173563? (can you close this one if so, or clarify)

Fixes #173521 Fixes #166124 Extend `dim_squeeze` to handle multiple dimensions by normalizing all dim variants to a target dimension set. This unifies the logic into a single code path. Fix the long-standing FIXME in dim_squeeze where squeeze(dim=None) could incorrectly remove sharded dimensions whose local size happened to be 1 (despite global size > 1). Canonicalizes all squeeze variants to squeeze.dims at the sharding propagation level using global shape to determine which dimensions are truly singleton. Strategy validator: 74 correct, 0 incorrect, 0 missing. This is without the P(max/min) - R rules mentioned below. - Add test_squeeze_variants to test all squeeze variants with DTensor ~~Note: op_db test remains xfail due to pre-existing bug where local squeeze removes sharded dims with local size 1 (see PR #166862).~~ That PR is/will be closed in favor of this approach that avoids a custom handler Pull Request resolved: #173563 Approved by: https://github.com/wconstab

Fixes pytorch#173521 Fixes pytorch#166124 Extend `dim_squeeze` to handle multiple dimensions by normalizing all dim variants to a target dimension set. This unifies the logic into a single code path. Fix the long-standing FIXME in dim_squeeze where squeeze(dim=None) could incorrectly remove sharded dimensions whose local size happened to be 1 (despite global size > 1). Canonicalizes all squeeze variants to squeeze.dims at the sharding propagation level using global shape to determine which dimensions are truly singleton. Strategy validator: 74 correct, 0 incorrect, 0 missing. This is without the P(max/min) - R rules mentioned below. - Add test_squeeze_variants to test all squeeze variants with DTensor ~~Note: op_db test remains xfail due to pre-existing bug where local squeeze removes sharded dims with local size 1 (see PR pytorch#166862).~~ That PR is/will be closed in favor of this approach that avoids a custom handler Pull Request resolved: pytorch#173563 Approved by: https://github.com/wconstab

pytorch-bot Bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Nov 3, 2025

pytorchbot added the open source label Nov 3, 2025

pytorch-bot Bot added the topic: not user facing topic category label Nov 3, 2025

janeyx99 requested a review from XilunWu November 7, 2025 16:50

janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 7, 2025

skpark-rh reviewed Nov 11, 2025

View reviewed changes

Comment thread torch/distributed/tensor/_dispatch.py Outdated

github-actions Bot added the Stale label Jan 10, 2026

wconstab reviewed Jan 20, 2026

View reviewed changes

tianyu-l reviewed Jan 20, 2026

View reviewed changes

Comment thread torch/distributed/tensor/_dispatch.py Outdated

mansiag05 force-pushed the fix-issue-166124 branch from 63674b9 to 105160b Compare January 27, 2026 04:58

pytorch-bot Bot added the release notes: distributed (dtensor) release notes category label Jan 27, 2026

stmcgovern mentioned this pull request Jan 27, 2026

[DTensor] Add sharding strategy for aten.squeeze.dims #173563

Closed

tianyu-l requested a review from pianpwk February 2, 2026 08:35

pianpwk reviewed Feb 2, 2026

View reviewed changes

github-actions Bot closed this Mar 8, 2026

Conversation

mansiag05 commented Nov 3, 2025 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166862

❌ 4 New Failures

Uh oh!

mansiag05 commented Nov 3, 2025

Uh oh!

mansiag05 commented Nov 3, 2025

Uh oh!

skpark-rh commented Nov 4, 2025

Uh oh!

skpark-rh commented Nov 5, 2025

Uh oh!

mansiag05 commented Nov 11, 2025

Uh oh!

skpark-rh commented Nov 11, 2025

Uh oh!

Uh oh!

github-actions Bot commented Jan 10, 2026

Uh oh!

wconstab left a comment

Choose a reason for hiding this comment

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mansiag05 commented Jan 27, 2026

Uh oh!

tianyu-l commented Feb 2, 2026

Uh oh!

wconstab commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stmcgovern commented Feb 2, 2026

Uh oh!

pianpwk Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

stmcgovern commented Feb 3, 2026

Uh oh!

wconstab commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

mansiag05 commented Nov 3, 2025 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Nov 3, 2025 •

edited

Loading

wconstab commented Feb 2, 2026 •

edited

Loading