Stop accessing func._schema in _python_dispatch.correct_storage_aliasing#161292
Closed
swolchok wants to merge 7 commits intogh/swolchok/795/basefrom
Closed
Stop accessing func._schema in _python_dispatch.correct_storage_aliasing#161292swolchok wants to merge 7 commits intogh/swolchok/795/basefrom
swolchok wants to merge 7 commits intogh/swolchok/795/basefrom
Conversation
func._schema is a pybind, accessing the arguments/returns is expensive, and we have no reason to do it anyway. [ghstack-poisoned]
This was referenced Aug 22, 2025
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161292
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit f803118 with merge base 05eeb29 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This was referenced Aug 22, 2025
…orage_aliasing" func._schema is a pybind, accessing the arguments/returns is expensive, and we have no reason to do it anyway. [ghstack-poisoned]
This was referenced Aug 22, 2025
This was referenced Aug 22, 2025
…orage_aliasing" func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though #161301 makes accessing the arguments/returns less expensive, this still seems to improve performance. [ghstack-poisoned]
This was referenced Aug 22, 2025
This was referenced Aug 23, 2025
…orage_aliasing" func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though #161301 makes accessing the arguments/returns less expensive, this still seems to improve performance. [ghstack-poisoned]
This was referenced Aug 25, 2025
pytorchmergebot
pushed a commit
that referenced
this pull request
Aug 30, 2025
…61304) `is` checks object identity and is more efficient. Google seems to confirm it is the correct way to do an exact type check. Pull Request resolved: #161304 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/bdhirsh ghstack dependencies: #161301, #161292
pytorchmergebot
pushed a commit
that referenced
this pull request
Aug 30, 2025
…#161308) - Empty containers are Falsey - Hoist cheap checks first - Microbenchmarked single-element set access method Benchmark code: ``` import timeit to_test = [ ('list(x)', 'x = set([3])'), ('x[0]', 'x = [3]'), ('list(x)[0]', 'x = set([3])'), ('next(iter(x))', 'x = set([3])'), ] for (stmt, setup) in to_test: res = timeit.timeit(stmt=stmt, setup=setup) print(f"Time for `{stmt}`: {res}") ``` Result with Python 3.13 on Mac (with excess digits manually trimmed; directionally matches result on Linux) ``` Time for `list(x)`: 0.03418 Time for `x[0]`: 0.00852 Time for `list(x)[0]`: 0.03561 Time for `next(iter(x))`: 0.02278 ``` FWIW, I was surprised by this result, so I guess I'm glad I wrote the benchmark! Pull Request resolved: #161308 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: #161301, #161292, #161304
pytorchmergebot
pushed a commit
that referenced
this pull request
Aug 30, 2025
) Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. Pull Request resolved: #161315 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: #161301, #161292, #161304, #161308
pytorchmergebot
pushed a commit
that referenced
this pull request
Aug 30, 2025
…61317) This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly. Pull Request resolved: #161317 Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh ghstack dependencies: #161301, #161292, #161304, #161308, #161315
pytorchmergebot
pushed a commit
that referenced
this pull request
Aug 30, 2025
pytorchmergebot
pushed a commit
that referenced
this pull request
Aug 30, 2025
`auto` forces a copy. Confirmed this did something noticable with perf. Pull Request resolved: #161329 Approved by: https://github.com/zpcore, https://github.com/fduwjj, https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328
pytorchmergebot
pushed a commit
that referenced
this pull request
Aug 30, 2025
If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163) Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small. Pull Request resolved: #161432 Approved by: https://github.com/ezyang ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328, #161329
pytorchmergebot
pushed a commit
that referenced
this pull request
Aug 31, 2025
) symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast. Pull Request resolved: #161455 Approved by: https://github.com/Skylion007 ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328, #161329, #161432
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
…ing (pytorch#161292) func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though pytorch#161301 makes accessing the arguments/returns less expensive, this still seems to improve performance. Pull Request resolved: pytorch#161292 Approved by: https://github.com/wconstab, https://github.com/malfet, https://github.com/bdhirsh ghstack dependencies: pytorch#161301
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
…torch#161304) `is` checks object identity and is more efficient. Google seems to confirm it is the correct way to do an exact type check. Pull Request resolved: pytorch#161304 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
…pytorch#161308) - Empty containers are Falsey - Hoist cheap checks first - Microbenchmarked single-element set access method Benchmark code: ``` import timeit to_test = [ ('list(x)', 'x = set([3])'), ('x[0]', 'x = [3]'), ('list(x)[0]', 'x = set([3])'), ('next(iter(x))', 'x = set([3])'), ] for (stmt, setup) in to_test: res = timeit.timeit(stmt=stmt, setup=setup) print(f"Time for `{stmt}`: {res}") ``` Result with Python 3.13 on Mac (with excess digits manually trimmed; directionally matches result on Linux) ``` Time for `list(x)`: 0.03418 Time for `x[0]`: 0.00852 Time for `list(x)[0]`: 0.03561 Time for `next(iter(x))`: 0.02278 ``` FWIW, I was surprised by this result, so I guess I'm glad I wrote the benchmark! Pull Request resolved: pytorch#161308 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
…rch#161315) Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. Pull Request resolved: pytorch#161315 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
…torch#161317) This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly. Pull Request resolved: pytorch#161317 Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
…61328) Not a huge cost, but free win is free. Pull Request resolved: pytorch#161328 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
`auto` forces a copy. Confirmed this did something noticable with perf. Pull Request resolved: pytorch#161329 Approved by: https://github.com/zpcore, https://github.com/fduwjj, https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
…h#161432) If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163) Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small. Pull Request resolved: pytorch#161432 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
…rch#161455) symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast. Pull Request resolved: pytorch#161455 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329, pytorch#161432
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…ing (pytorch#161292) func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though pytorch#161301 makes accessing the arguments/returns less expensive, this still seems to improve performance. Pull Request resolved: pytorch#161292 Approved by: https://github.com/wconstab, https://github.com/malfet, https://github.com/bdhirsh ghstack dependencies: pytorch#161301
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…torch#161304) `is` checks object identity and is more efficient. Google seems to confirm it is the correct way to do an exact type check. Pull Request resolved: pytorch#161304 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…pytorch#161308) - Empty containers are Falsey - Hoist cheap checks first - Microbenchmarked single-element set access method Benchmark code: ``` import timeit to_test = [ ('list(x)', 'x = set([3])'), ('x[0]', 'x = [3]'), ('list(x)[0]', 'x = set([3])'), ('next(iter(x))', 'x = set([3])'), ] for (stmt, setup) in to_test: res = timeit.timeit(stmt=stmt, setup=setup) print(f"Time for `{stmt}`: {res}") ``` Result with Python 3.13 on Mac (with excess digits manually trimmed; directionally matches result on Linux) ``` Time for `list(x)`: 0.03418 Time for `x[0]`: 0.00852 Time for `list(x)[0]`: 0.03561 Time for `next(iter(x))`: 0.02278 ``` FWIW, I was surprised by this result, so I guess I'm glad I wrote the benchmark! Pull Request resolved: pytorch#161308 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…rch#161315) Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. Pull Request resolved: pytorch#161315 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…torch#161317) This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly. Pull Request resolved: pytorch#161317 Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…61328) Not a huge cost, but free win is free. Pull Request resolved: pytorch#161328 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
`auto` forces a copy. Confirmed this did something noticable with perf. Pull Request resolved: pytorch#161329 Approved by: https://github.com/zpcore, https://github.com/fduwjj, https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…h#161432) If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163) Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small. Pull Request resolved: pytorch#161432 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…rch#161455) symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast. Pull Request resolved: pytorch#161455 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329, pytorch#161432
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
is, not ==, to check exact type matches in _python_dispatch #161304func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though #161301 makes accessing the arguments/returns less expensive, this still seems to improve performance.