[dtensor] avoid shape recompilations on DTensorSpec by pianpwk · Pull Request #163820 · pytorch/pytorch

pianpwk · 2025-09-25T00:16:01Z

skips DTensorSpec.sizes/strides in metadata guard checks

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim @dcci

pytorch-bot · 2025-09-25T00:16:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163820

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 87a3e3d with merge base f63d16c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

azahed98

Changes lgtm. Let me know when final review is needed.

azahed98

Lgtm! Thanks for the change

pianpwk · 2025-09-25T22:37:24Z

@pytorchbot merge

pytorchmergebot · 2025-09-25T22:39:20Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-09-25T23:16:36Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 3, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu)

Details for Dev Infra team

Raised by workflow job

…dtensor_shape_metadata_guard

ezyang · 2025-09-30T22:13:08Z

Sorry hold up, why is this the right thing to do?

ezyang · 2025-09-30T22:14:24Z

Is the claim that DtensorSpec size/stride always equal to the outer tensor size/stride?

pianpwk · 2025-09-30T23:48:10Z

Is the claim that DtensorSpec size/stride always equal to the outer tensor size/stride?

Yepp, the metadata check was on the original eager DTensorSpec (sample input), which hardcoded the static outer sizes/strides

I think the dynamo ShapeEnv guards should be enough to handle checking any size/stride constraints?

azahed98 · 2025-10-01T00:25:01Z

            raise RuntimeError("Unsupported tensor type!")

+    @classmethod
+    def __metadata_guard__(cls, orig, other):


Nit: return type annotation

…dtensor_shape_metadata_guard

…om/pytorch/pytorch into pianpwk/dtensor_shape_metadata_guard

ezyang · 2025-10-01T23:08:32Z

This feels incomplete. When I trace a DTensor with dynamic shapes, does the DTensorSpec size/stride become symbolic? Because if it doesn't, then that seems like a problem. And if they do become symbolic, then why would we end up guarding on them?

azahed98 · 2025-10-01T23:29:12Z

When I trace a DTensor with dynamic shapes, does the DTensorSpec size/stride become symbolic?

I just double checked, and the DTensorSpec does contain symints during tracing.

And if they do become symbolic, then why would we end up guarding on them?

I'm admittedly not that familiar with how the guards are constructed, but I was going off #152963, under "additional work" point 1, allegedly the symints are undesirable for constructing the guards and cause recompiles. @bdhirsh Could you provide more context?

pianpwk · 2025-10-02T00:05:31Z

My understanding is that the DTensorSpec that contains symints, is the one in the DTensor wrapped by dynamo (appearing at the top of the dynamo graph), so fake tensor prop is done dynamically, and that's all good.

But the problem is that the tensor subclass metadata guards (TENSOR_SUBCLASS_METADATA_MATCH) are installed against the pre-wrapped, eager mode DTensor, so we were previously just guarding on the original static shapes, instead of SymInts.

I'm not sure if we can just install guards against the wrapped DTensor instead?

ezyang · 2025-10-02T03:49:13Z

It is a smell to me that __tensor_flatten__ on DTensor returns stuff that shouldn't actually be compared against. Maybe it all works out but I would worry there are other places where we assume we can test against the metadata directly that cause problems.

pianpwk · 2025-10-03T17:10:40Z

@pytorchbot merge

pytorchmergebot · 2025-10-03T17:12:33Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

skips DTensorSpec.sizes/strides in metadata guard checks Pull Request resolved: pytorch#163820 Approved by: https://github.com/azahed98

init

9193b9d

pytorch-bot Bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Sep 25, 2025

pianpwk added the release notes: distributed (dtensor) release notes category label Sep 25, 2025

pianpwk changed the title ~~[WIP] avoid shape recompile on dtensor~~ [WIP][dtensor] avoid shape recompilations on DTensorSpec Sep 25, 2025

azahed98 reviewed Sep 25, 2025

View reviewed changes

pianpwk changed the title ~~[WIP][dtensor] avoid shape recompilations on DTensorSpec~~ [dtensor] avoid shape recompilations on DTensorSpec Sep 25, 2025

pianpwk marked this pull request as ready for review September 25, 2025 18:03

pianpwk requested review from azahed98 and bdhirsh September 25, 2025 18:04

azahed98 approved these changes Sep 25, 2025

View reviewed changes

pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 25, 2025

pytorchmergebot added the merging label Sep 25, 2025

pytorchmergebot removed the merging label Sep 25, 2025

pianpwk added 2 commits September 30, 2025 11:27

Merge branch 'main' of ssh://github.com/pytorch/pytorch into pianpwk/…

580ba60

…dtensor_shape_metadata_guard

fix compiled autograd tests

c1bc404

azahed98 reviewed Oct 1, 2025

View reviewed changes

pianpwk added 4 commits September 30, 2025 22:16

Update _api.py

7ee4817

Update _api.py

9cb53ce

Merge branch 'main' of ssh://github.com/pytorch/pytorch into pianpwk/…

0717631

…dtensor_shape_metadata_guard

Merge branch 'pianpwk/dtensor_shape_metadata_guard' of ssh://github.c…

87a3e3d

…om/pytorch/pytorch into pianpwk/dtensor_shape_metadata_guard

pytorchmergebot added the merging label Oct 3, 2025

pytorchmergebot closed this in 5b0b4cd Oct 3, 2025

pytorchmergebot added Merged and removed merging labels Oct 3, 2025

github-actions Bot deleted the pianpwk/dtensor_shape_metadata_guard branch November 3, 2025 02:17

Conversation

pianpwk commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163820

✅ No Failures

Uh oh!

azahed98 left a comment

Choose a reason for hiding this comment

Uh oh!

azahed98 left a comment

Choose a reason for hiding this comment

Uh oh!

pianpwk commented Sep 25, 2025

Uh oh!

pytorchmergebot commented Sep 25, 2025

Merge started

Uh oh!

pytorchmergebot commented Sep 25, 2025

Merge failed

Uh oh!

ezyang commented Sep 30, 2025

Uh oh!

ezyang commented Sep 30, 2025

Uh oh!

pianpwk commented Sep 30, 2025

Uh oh!

azahed98 Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang commented Oct 1, 2025

Uh oh!

azahed98 commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pianpwk commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Oct 2, 2025

Uh oh!

pianpwk commented Oct 3, 2025

Uh oh!

pytorchmergebot commented Oct 3, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pianpwk commented Sep 25, 2025 •

edited

Loading

pytorch-bot Bot commented Sep 25, 2025 •

edited

Loading

azahed98 commented Oct 1, 2025 •

edited

Loading

pianpwk commented Oct 2, 2025 •

edited

Loading