Stop accessing func._schema in _python_dispatch.correct_storage_aliasing by swolchok · Pull Request #161292 · pytorch/pytorch

swolchok · 2025-08-22T18:36:47Z

Stack from ghstack (oldest at bottom):

func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though #161301 makes accessing the arguments/returns less expensive, this still seems to improve performance.

func._schema is a pybind, accessing the arguments/returns is expensive, and we have no reason to do it anyway. [ghstack-poisoned]

pytorch-bot · 2025-08-22T18:36:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161292

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm MI2xx CI/CD workflows failing due to : download from https://api.github.com/repos/pytorch/pytorch timed out.

✅ No Failures

As of commit f803118 with merge base 05eeb29 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

func._schema is a pybind, accessing the arguments/returns is expensive, and we have no reason to do it anyway. ghstack-source-id: 062823f Pull Request resolved: #161292

…orage_aliasing" func._schema is a pybind, accessing the arguments/returns is expensive, and we have no reason to do it anyway. [ghstack-poisoned]

func._schema is a pybind, accessing the arguments/returns is expensive, and we have no reason to do it anyway. ghstack-source-id: 43af3ef Pull Request resolved: #161292

…orage_aliasing" func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though #161301 makes accessing the arguments/returns less expensive, this still seems to improve performance. [ghstack-poisoned]

…61304) `is` checks object identity and is more efficient. Google seems to confirm it is the correct way to do an exact type check. Pull Request resolved: #161304 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/bdhirsh ghstack dependencies: #161301, #161292

…#161308) - Empty containers are Falsey - Hoist cheap checks first - Microbenchmarked single-element set access method Benchmark code: ``` import timeit to_test = [ ('list(x)', 'x = set([3])'), ('x[0]', 'x = [3]'), ('list(x)[0]', 'x = set([3])'), ('next(iter(x))', 'x = set([3])'), ] for (stmt, setup) in to_test: res = timeit.timeit(stmt=stmt, setup=setup) print(f"Time for `{stmt}`: {res}") ``` Result with Python 3.13 on Mac (with excess digits manually trimmed; directionally matches result on Linux) ``` Time for `list(x)`: 0.03418 Time for `x[0]`: 0.00852 Time for `list(x)[0]`: 0.03561 Time for `next(iter(x))`: 0.02278 ``` FWIW, I was surprised by this result, so I guess I'm glad I wrote the benchmark! Pull Request resolved: #161308 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: #161301, #161292, #161304

) Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. Pull Request resolved: #161315 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: #161301, #161292, #161304, #161308

…61317) This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly. Pull Request resolved: #161317 Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh ghstack dependencies: #161301, #161292, #161304, #161308, #161315

Not a huge cost, but free win is free. Pull Request resolved: #161328 Approved by: https://github.com/Skylion007 ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317

`auto` forces a copy. Confirmed this did something noticable with perf. Pull Request resolved: #161329 Approved by: https://github.com/zpcore, https://github.com/fduwjj, https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328

If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163) Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small. Pull Request resolved: #161432 Approved by: https://github.com/ezyang ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328, #161329

) symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast. Pull Request resolved: #161455 Approved by: https://github.com/Skylion007 ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328, #161329, #161432

…ing (pytorch#161292) func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though pytorch#161301 makes accessing the arguments/returns less expensive, this still seems to improve performance. Pull Request resolved: pytorch#161292 Approved by: https://github.com/wconstab, https://github.com/malfet, https://github.com/bdhirsh ghstack dependencies: pytorch#161301

…torch#161304) `is` checks object identity and is more efficient. Google seems to confirm it is the correct way to do an exact type check. Pull Request resolved: pytorch#161304 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292

…pytorch#161308) - Empty containers are Falsey - Hoist cheap checks first - Microbenchmarked single-element set access method Benchmark code: ``` import timeit to_test = [ ('list(x)', 'x = set([3])'), ('x[0]', 'x = [3]'), ('list(x)[0]', 'x = set([3])'), ('next(iter(x))', 'x = set([3])'), ] for (stmt, setup) in to_test: res = timeit.timeit(stmt=stmt, setup=setup) print(f"Time for `{stmt}`: {res}") ``` Result with Python 3.13 on Mac (with excess digits manually trimmed; directionally matches result on Linux) ``` Time for `list(x)`: 0.03418 Time for `x[0]`: 0.00852 Time for `list(x)[0]`: 0.03561 Time for `next(iter(x))`: 0.02278 ``` FWIW, I was surprised by this result, so I guess I'm glad I wrote the benchmark! Pull Request resolved: pytorch#161308 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304

…rch#161315) Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. Pull Request resolved: pytorch#161315 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308

…torch#161317) This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly. Pull Request resolved: pytorch#161317 Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315

…61328) Not a huge cost, but free win is free. Pull Request resolved: pytorch#161328 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317

`auto` forces a copy. Confirmed this did something noticable with perf. Pull Request resolved: pytorch#161329 Approved by: https://github.com/zpcore, https://github.com/fduwjj, https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328

…h#161432) If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163) Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small. Pull Request resolved: pytorch#161432 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329

…rch#161455) symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast. Pull Request resolved: pytorch#161455 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329, pytorch#161432

…ing (pytorch#161292) func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though pytorch#161301 makes accessing the arguments/returns less expensive, this still seems to improve performance. Pull Request resolved: pytorch#161292 Approved by: https://github.com/wconstab, https://github.com/malfet, https://github.com/bdhirsh ghstack dependencies: pytorch#161301

…torch#161304) `is` checks object identity and is more efficient. Google seems to confirm it is the correct way to do an exact type check. Pull Request resolved: pytorch#161304 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292

…pytorch#161308) - Empty containers are Falsey - Hoist cheap checks first - Microbenchmarked single-element set access method Benchmark code: ``` import timeit to_test = [ ('list(x)', 'x = set([3])'), ('x[0]', 'x = [3]'), ('list(x)[0]', 'x = set([3])'), ('next(iter(x))', 'x = set([3])'), ] for (stmt, setup) in to_test: res = timeit.timeit(stmt=stmt, setup=setup) print(f"Time for `{stmt}`: {res}") ``` Result with Python 3.13 on Mac (with excess digits manually trimmed; directionally matches result on Linux) ``` Time for `list(x)`: 0.03418 Time for `x[0]`: 0.00852 Time for `list(x)[0]`: 0.03561 Time for `next(iter(x))`: 0.02278 ``` FWIW, I was surprised by this result, so I guess I'm glad I wrote the benchmark! Pull Request resolved: pytorch#161308 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304

…rch#161315) Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation. This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish. Pull Request resolved: pytorch#161315 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308

…torch#161317) This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly. Pull Request resolved: pytorch#161317 Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315

…61328) Not a huge cost, but free win is free. Pull Request resolved: pytorch#161328 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317

`auto` forces a copy. Confirmed this did something noticable with perf. Pull Request resolved: pytorch#161329 Approved by: https://github.com/zpcore, https://github.com/fduwjj, https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328

…h#161432) If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163) Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small. Pull Request resolved: pytorch#161432 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329

…rch#161455) symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast. Pull Request resolved: pytorch#161455 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329, pytorch#161432

Stop accessing func._schema in _python_dispatch.correct_storage_aliasing

b30da6f

func._schema is a pybind, accessing the arguments/returns is expensive, and we have no reason to do it anyway. [ghstack-poisoned]

This was referenced Aug 22, 2025

Fix OpSchema equality check #161231

Closed

Use comparison key in OpSchema to avoid duplicate work between __hash__ and __eq__ #161234

Closed

Minor cleanup of DeviceMesh.__eq__ #161235

Closed

Update on "Stop accessing func._schema in _python_dispatch.correct_st…

eac61d3

…orage_aliasing" func._schema is a pybind, accessing the arguments/returns is expensive, and we have no reason to do it anyway. [ghstack-poisoned]

This was referenced Aug 22, 2025

Fix non-const reference arguments in torch/csrc/jit/python/init.cpp #161300

Closed

Fix forced copying def_property_readonly for FunctionSchema & friends #161301

Closed

swolchok requested a review from bdhirsh August 22, 2025 19:49

swolchok added the topic: not user facing topic category label Aug 22, 2025

This was referenced Aug 22, 2025

Use is, not ==, to check exact type matches in _python_dispatch #161304

Closed

Optimize _python_dispatch.return_and_correct_aliasing.get_write_alias #161308

Closed

This was referenced Aug 22, 2025

Fix pybind enum efficiency issue in return_and_correct_aliasing #161315

Closed

Improve assert perf in _python_dispatch._correct_storage_aliasing #161317

Closed

swolchok requested review from XilunWu, ezyang and zpcore August 22, 2025 22:31

This was referenced Aug 23, 2025

Avoid double hash lookup in torch._library.simple_registry #161328

Closed

Fix accidental copy in pushPyOutToStack #161329

Closed

This was referenced Aug 25, 2025

Stop trying to intern arguments in PyObject_FastGetAttrString #161432

Closed

Check for Index before Tensors in is_int_or_symint #161433

Closed

Remove early torch::is_symint call in is_int_or_symint #161438

Closed

pytorchmergebot closed this in 1a64bf2 Aug 30, 2025

pytorchmergebot added the Merged label Aug 30, 2025

github-actions bot deleted the gh/swolchok/795/head branch September 30, 2025 02:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop accessing func._schema in _python_dispatch.correct_storage_aliasing#161292

Stop accessing func._schema in _python_dispatch.correct_storage_aliasing#161292
swolchok wants to merge 7 commits intogh/swolchok/795/basefrom
gh/swolchok/795/head

swolchok commented Aug 22, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

swolchok commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161292

❗ 1 Active SEVs

✅ No Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

swolchok commented Aug 22, 2025 •

edited

Loading

pytorch-bot bot commented Aug 22, 2025 •

edited

Loading