Skip to content

DTensor: dont hash symint tensor input in propagate_tensor_meta#136266

Closed
bdhirsh wants to merge 2 commits intogh/bdhirsh/613/basefrom
gh/bdhirsh/613/head
Closed

DTensor: dont hash symint tensor input in propagate_tensor_meta#136266
bdhirsh wants to merge 2 commits intogh/bdhirsh/613/basefrom
gh/bdhirsh/613/head

Conversation

@bdhirsh
Copy link
Collaborator

@bdhirsh bdhirsh commented Sep 18, 2024

This fixes a subset of issues for dynamic shapes + DTensor.

It's pretty easy to run into other issues - it's likely that we need #125941 to land for DTensor + dynamic shapes to work more generally. I ended up writing a test that had dynamic shape inputs but not dynamic shape outputs in order to properly test this fix

Stack from ghstack (oldest at bottom):

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 18, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136266

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1110c56 with merge base c8d152c (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@ezyang
Copy link
Contributor

ezyang commented Sep 18, 2024

It does seem like there's no reason to believe other call sites wouldn't pass in SymInt?

…_meta"

This fixes a subset of issues for dynamic shapes + DTensor.

It's pretty easy to run into other issues - it's likely that we need #125941 to land for DTensor + dynamic shapes to work more generally. I ended up writing a test that had dynamic shape inputs but not dynamic shape outputs in order to properly test this fix




cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
@bdhirsh
Copy link
Collaborator Author

bdhirsh commented Sep 18, 2024

You're right - I just audited usages of lru_cache in torch/distributed/tensor - I found 3 total, including one that I missed. I added a similar symint non-cached branch for that other site too and updated the test.

Given that there are 3 caches across all of DTensor's codepaths, I'm hoping we'll be ok with auditing + testing to make sure that we e.g. don't add a new lru_cache later and accidentally regress dynamic shape usage.

@bdhirsh
Copy link
Collaborator Author

bdhirsh commented Sep 18, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@yf225
Copy link
Contributor

yf225 commented Sep 18, 2024

I'm hoping we'll be ok with auditing + testing to make sure that we e.g. don't add a new lru_cache later and accidentally regress dynamic shape usage.

thanks! and I assume the currently added unit test will be able to catch the regression in the future?

@bdhirsh
Copy link
Collaborator Author

bdhirsh commented Sep 18, 2024

thanks! and I assume the currently added unit test will be able to catch the regression in the future?

It should catch most cases - although for better coverage, we should probably take all of the DTensor + compile tests and run them with dynamic shapes. We can't do that until #125941 lands though

@bdhirsh bdhirsh added the release notes: distributed (dtensor) release notes category label Sep 18, 2024
@bdhirsh
Copy link
Collaborator Author

bdhirsh commented Sep 18, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@bdhirsh
Copy link
Collaborator Author

bdhirsh commented Sep 19, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@yf225
Copy link
Contributor

yf225 commented Sep 19, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
…rch#136266)

This fixes a subset of issues for dynamic shapes + DTensor.

It's pretty easy to run into other issues - it's likely that we need pytorch#125941 to land for DTensor + dynamic shapes to work more generally. I ended up writing a test that had dynamic shape inputs but not dynamic shape outputs in order to properly test this fix

Pull Request resolved: pytorch#136266
Approved by: https://github.com/ezyang, https://github.com/yf225
@github-actions github-actions bot deleted the gh/bdhirsh/613/head branch October 20, 2024 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (dtensor) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants