Add inline fast paths for SymInt operators#161586
Closed
swolchok wants to merge 5 commits intogh/swolchok/810/basefrom
Closed
Add inline fast paths for SymInt operators#161586swolchok wants to merge 5 commits intogh/swolchok/810/basefrom
swolchok wants to merge 5 commits intogh/swolchok/810/basefrom
Conversation
If SymInt::maybe_as_int() returns non-empty, then we get an inline fast path. The philosophy here (as with the previous PR) is to preserve performance in the "plain old ints" case. Observed time spent in SymInt functions in computeStorageNBytes to drop (and not cost shift elsewhere in the function) after this change, profiling detach() using code similar to the benchmark from #160580 and Linux perf. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161586
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ You can merge normally! (3 Unrelated Failures)As of commit adecc7e with merge base dcf3853 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This was referenced Aug 27, 2025
swolchok
added a commit
that referenced
this pull request
Aug 27, 2025
If SymInt::maybe_as_int() returns non-empty, then we get an inline fast path. The philosophy here (as with the previous PR) is to preserve performance in the "plain old ints" case. Observed time spent in SymInt functions in computeStorageNBytes to drop (and not cost shift elsewhere in the function) after this change, profiling detach() using code similar to the benchmark from #160580 and Linux perf. ghstack-source-id: 67895a5 Pull Request resolved: #161586
This was referenced Aug 27, 2025
This was referenced Sep 4, 2025
pytorchmergebot
pushed a commit
that referenced
this pull request
Sep 5, 2025
…161590) This seems to be a (very very roughly) ~8% improvement on DTensor benchmark very similar to the benchmark from #160580 (120ish usec -> 110ish usec) Differential Revision: [D81530105](https://our.internmc.facebook.com/intern/diff/D81530105) Pull Request resolved: #161590 Approved by: https://github.com/albanD ghstack dependencies: #161466, #161586
This was referenced Sep 5, 2025
daisyden
pushed a commit
to daisyden/pytorch
that referenced
this pull request
Sep 8, 2025
…ytorch#161590) This seems to be a (very very roughly) ~8% improvement on DTensor benchmark very similar to the benchmark from pytorch#160580 (120ish usec -> 110ish usec) Differential Revision: [D81530105](https://our.internmc.facebook.com/intern/diff/D81530105) Pull Request resolved: pytorch#161590 Approved by: https://github.com/albanD ghstack dependencies: pytorch#161466, pytorch#161586
pytorchmergebot
pushed a commit
that referenced
this pull request
Sep 8, 2025
…lready know (#161591) We already know when we're called from make_wrapper_subclass or make_dtensor. The check isn't particularly cheap. Differential Revision: [D81530099](https://our.internmc.facebook.com/intern/diff/D81530099) Pull Request resolved: #161591 Approved by: https://github.com/ezyang ghstack dependencies: #161466, #161586, #161590
pytorchmergebot
pushed a commit
that referenced
this pull request
Sep 8, 2025
…161595) This seems to have been an especially slow one because of the repeated pybind access (schema is a pybind, as is arguments, and then we hit each argument). It's still ~~1% of total benchmark runtime because of the repeated single pybind function call, but that's a lot better. Differential Revision: [D81530095](https://our.internmc.facebook.com/intern/diff/D81530095) Pull Request resolved: #161595 Approved by: https://github.com/ezyang, https://github.com/bdhirsh ghstack dependencies: #161466, #161586, #161590, #161591
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
If SymInt::maybe_as_int() returns non-empty, then we get an inline fast path. The philosophy here (as with the previous PR) is to preserve performance in the "plain old ints" case. Observed time spent in SymInt functions in computeStorageNBytes to drop (and not cost shift elsewhere in the function) after this change, profiling detach() using code similar to the benchmark from pytorch#160580 and Linux perf. Differential Revision: [D81530107](https://our.internmc.facebook.com/intern/diff/D81530107) Pull Request resolved: pytorch#161586 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161466
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
…ytorch#161590) This seems to be a (very very roughly) ~8% improvement on DTensor benchmark very similar to the benchmark from pytorch#160580 (120ish usec -> 110ish usec) Differential Revision: [D81530105](https://our.internmc.facebook.com/intern/diff/D81530105) Pull Request resolved: pytorch#161590 Approved by: https://github.com/albanD ghstack dependencies: pytorch#161466, pytorch#161586
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
…lready know (pytorch#161591) We already know when we're called from make_wrapper_subclass or make_dtensor. The check isn't particularly cheap. Differential Revision: [D81530099](https://our.internmc.facebook.com/intern/diff/D81530099) Pull Request resolved: pytorch#161591 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161466, pytorch#161586, pytorch#161590
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
…ytorch#161595) This seems to have been an especially slow one because of the repeated pybind access (schema is a pybind, as is arguments, and then we hit each argument). It's still ~~1% of total benchmark runtime because of the repeated single pybind function call, but that's a lot better. Differential Revision: [D81530095](https://our.internmc.facebook.com/intern/diff/D81530095) Pull Request resolved: pytorch#161595 Approved by: https://github.com/ezyang, https://github.com/bdhirsh ghstack dependencies: pytorch#161466, pytorch#161586, pytorch#161590, pytorch#161591
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
If SymInt::maybe_as_int() returns non-empty, then we get an inline fast path. The philosophy here (as with the previous PR) is to preserve performance in the "plain old ints" case. Observed time spent in SymInt functions in computeStorageNBytes to drop (and not cost shift elsewhere in the function) after this change, profiling detach() using code similar to the benchmark from pytorch#160580 and Linux perf. Differential Revision: [D81530107](https://our.internmc.facebook.com/intern/diff/D81530107) Pull Request resolved: pytorch#161586 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161466
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…ytorch#161590) This seems to be a (very very roughly) ~8% improvement on DTensor benchmark very similar to the benchmark from pytorch#160580 (120ish usec -> 110ish usec) Differential Revision: [D81530105](https://our.internmc.facebook.com/intern/diff/D81530105) Pull Request resolved: pytorch#161590 Approved by: https://github.com/albanD ghstack dependencies: pytorch#161466, pytorch#161586
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…lready know (pytorch#161591) We already know when we're called from make_wrapper_subclass or make_dtensor. The check isn't particularly cheap. Differential Revision: [D81530099](https://our.internmc.facebook.com/intern/diff/D81530099) Pull Request resolved: pytorch#161591 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161466, pytorch#161586, pytorch#161590
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…ytorch#161595) This seems to have been an especially slow one because of the repeated pybind access (schema is a pybind, as is arguments, and then we hit each argument). It's still ~~1% of total benchmark runtime because of the repeated single pybind function call, but that's a lot better. Differential Revision: [D81530095](https://our.internmc.facebook.com/intern/diff/D81530095) Pull Request resolved: pytorch#161595 Approved by: https://github.com/ezyang, https://github.com/bdhirsh ghstack dependencies: pytorch#161466, pytorch#161586, pytorch#161590, pytorch#161591
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
…ytorch#161590) This seems to be a (very very roughly) ~8% improvement on DTensor benchmark very similar to the benchmark from pytorch#160580 (120ish usec -> 110ish usec) Differential Revision: [D81530105](https://our.internmc.facebook.com/intern/diff/D81530105) Pull Request resolved: pytorch#161590 Approved by: https://github.com/albanD ghstack dependencies: pytorch#161466, pytorch#161586
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
…lready know (pytorch#161591) We already know when we're called from make_wrapper_subclass or make_dtensor. The check isn't particularly cheap. Differential Revision: [D81530099](https://our.internmc.facebook.com/intern/diff/D81530099) Pull Request resolved: pytorch#161591 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161466, pytorch#161586, pytorch#161590
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
…ytorch#161595) This seems to have been an especially slow one because of the repeated pybind access (schema is a pybind, as is arguments, and then we hit each argument). It's still ~~1% of total benchmark runtime because of the repeated single pybind function call, but that's a lot better. Differential Revision: [D81530095](https://our.internmc.facebook.com/intern/diff/D81530095) Pull Request resolved: pytorch#161595 Approved by: https://github.com/ezyang, https://github.com/bdhirsh ghstack dependencies: pytorch#161466, pytorch#161586, pytorch#161590, pytorch#161591
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
If SymInt::maybe_as_int() returns non-empty, then we get an inline fast path. The philosophy here (as with the previous PR) is to preserve performance in the "plain old ints" case. Observed time spent in SymInt functions in computeStorageNBytes to drop (and not cost shift elsewhere in the function) after this change, profiling detach() using code similar to the benchmark from pytorch#160580 and Linux perf. Differential Revision: [D81530107](https://our.internmc.facebook.com/intern/diff/D81530107) Pull Request resolved: pytorch#161586 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161466
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
…ytorch#161590) This seems to be a (very very roughly) ~8% improvement on DTensor benchmark very similar to the benchmark from pytorch#160580 (120ish usec -> 110ish usec) Differential Revision: [D81530105](https://our.internmc.facebook.com/intern/diff/D81530105) Pull Request resolved: pytorch#161590 Approved by: https://github.com/albanD ghstack dependencies: pytorch#161466, pytorch#161586
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
…lready know (pytorch#161591) We already know when we're called from make_wrapper_subclass or make_dtensor. The check isn't particularly cheap. Differential Revision: [D81530099](https://our.internmc.facebook.com/intern/diff/D81530099) Pull Request resolved: pytorch#161591 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161466, pytorch#161586, pytorch#161590
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
…ytorch#161595) This seems to have been an especially slow one because of the repeated pybind access (schema is a pybind, as is arguments, and then we hit each argument). It's still ~~1% of total benchmark runtime because of the repeated single pybind function call, but that's a lot better. Differential Revision: [D81530095](https://our.internmc.facebook.com/intern/diff/D81530095) Pull Request resolved: pytorch#161595 Approved by: https://github.com/ezyang, https://github.com/bdhirsh ghstack dependencies: pytorch#161466, pytorch#161586, pytorch#161590, pytorch#161591
swolchok
added a commit
that referenced
this pull request
Nov 13, 2025
…ue()? is_symbolic() appears to be inconsistent with the rest of the interface currently. This is a behavior change, but I believe the old behavior was a bug. Please review carefully. Motivation: #161586 (comment) Differential Revision: [D86982216](https://our.internmc.facebook.com/intern/diff/D86982216/) [ghstack-poisoned]
Contributor
Author
Khanaksahu
pushed a commit
to Khanaksahu/pytorch-fork
that referenced
this pull request
Nov 17, 2025
…ue()? is_symbolic() appears to be inconsistent with the rest of the interface currently. This is a behavior change, but I believe the old behavior was a bug. Please review carefully. Motivation: pytorch/pytorch#161586 (comment) Differential Revision: [D86982216](https://our.internmc.facebook.com/intern/diff/D86982216/) ghstack-source-id: 323055106 Pull Request resolved: pytorch/pytorch#167758
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
If SymInt::maybe_as_int() returns non-empty, then we get an inline
fast path. The philosophy here (as with the previous PR) is to
preserve performance in the "plain old ints" case.
Observed time spent in SymInt functions in computeStorageNBytes to
drop (and not cost shift elsewhere in the function) after this change,
profiling detach() using code similar to the benchmark from #160580
and Linux perf.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben
Differential Revision: D81530107