Skip to content

Stop accessing func._schema in _python_dispatch.correct_storage_aliasing#161292

Closed
swolchok wants to merge 7 commits intogh/swolchok/795/basefrom
gh/swolchok/795/head
Closed

Stop accessing func._schema in _python_dispatch.correct_storage_aliasing#161292
swolchok wants to merge 7 commits intogh/swolchok/795/basefrom
gh/swolchok/795/head

Conversation

func._schema is a pybind, accessing the arguments/returns is expensive, and we have no reason to do it anyway.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161292

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit f803118 with merge base 05eeb29 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…orage_aliasing"

func._schema is a pybind, accessing the arguments/returns is expensive, and we have no reason to do it anyway.

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Aug 22, 2025
func._schema is a pybind, accessing the arguments/returns is expensive, and we have no reason to do it anyway.

ghstack-source-id: 43af3ef
Pull Request resolved: #161292
@swolchok swolchok requested a review from bdhirsh August 22, 2025 19:49
@swolchok swolchok added the topic: not user facing topic category label Aug 22, 2025
…orage_aliasing"


func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though #161301 makes accessing the arguments/returns less expensive, this still seems to improve performance.

[ghstack-poisoned]
…orage_aliasing"


func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though #161301 makes accessing the arguments/returns less expensive, this still seems to improve performance.

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2025
…61304)

`is` checks object identity and is more efficient. Google seems to confirm it is the correct way to do an exact type check.
Pull Request resolved: #161304
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/bdhirsh
ghstack dependencies: #161301, #161292
pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2025
…#161308)

- Empty containers are Falsey
- Hoist cheap checks first
- Microbenchmarked single-element set access method

Benchmark code:
```
import timeit

to_test = [
    ('list(x)', 'x = set([3])'),
    ('x[0]', 'x = [3]'),
    ('list(x)[0]', 'x = set([3])'),
    ('next(iter(x))', 'x = set([3])'),
]

for (stmt, setup) in to_test:
    res = timeit.timeit(stmt=stmt, setup=setup)
    print(f"Time for `{stmt}`: {res}")
```

Result with Python 3.13 on Mac (with excess digits manually trimmed; directionally matches result on Linux)
```
Time for `list(x)`: 0.03418
Time for `x[0]`: 0.00852
Time for `list(x)[0]`: 0.03561
Time for `next(iter(x))`: 0.02278
```

FWIW, I was surprised by this result, so I guess I'm glad I wrote the benchmark!
Pull Request resolved: #161308
Approved by: https://github.com/Skylion007, https://github.com/bdhirsh
ghstack dependencies: #161301, #161292, #161304
pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2025
)

Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation.

This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish.
Pull Request resolved: #161315
Approved by: https://github.com/Skylion007, https://github.com/bdhirsh
ghstack dependencies: #161301, #161292, #161304, #161308
pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2025
…61317)

This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly.
Pull Request resolved: #161317
Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh
ghstack dependencies: #161301, #161292, #161304, #161308, #161315
pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2025
Not a huge cost, but free win is free.
Pull Request resolved: #161328
Approved by: https://github.com/Skylion007
ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317
pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2025
`auto` forces a copy. Confirmed this did something noticable with perf.
Pull Request resolved: #161329
Approved by: https://github.com/zpcore, https://github.com/fduwjj, https://github.com/Skylion007, https://github.com/bdhirsh
ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328
pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2025
If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163)

Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small.
Pull Request resolved: #161432
Approved by: https://github.com/ezyang
ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328, #161329
pytorchmergebot pushed a commit that referenced this pull request Aug 31, 2025
)

symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast.
Pull Request resolved: #161455
Approved by: https://github.com/Skylion007
ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328, #161329, #161432
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…ing (pytorch#161292)

func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though pytorch#161301 makes accessing the arguments/returns less expensive, this still seems to improve performance.
Pull Request resolved: pytorch#161292
Approved by: https://github.com/wconstab, https://github.com/malfet, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…torch#161304)

`is` checks object identity and is more efficient. Google seems to confirm it is the correct way to do an exact type check.
Pull Request resolved: pytorch#161304
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301, pytorch#161292
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…pytorch#161308)

- Empty containers are Falsey
- Hoist cheap checks first
- Microbenchmarked single-element set access method

Benchmark code:
```
import timeit

to_test = [
    ('list(x)', 'x = set([3])'),
    ('x[0]', 'x = [3]'),
    ('list(x)[0]', 'x = set([3])'),
    ('next(iter(x))', 'x = set([3])'),
]

for (stmt, setup) in to_test:
    res = timeit.timeit(stmt=stmt, setup=setup)
    print(f"Time for `{stmt}`: {res}")
```

Result with Python 3.13 on Mac (with excess digits manually trimmed; directionally matches result on Linux)
```
Time for `list(x)`: 0.03418
Time for `x[0]`: 0.00852
Time for `list(x)[0]`: 0.03561
Time for `next(iter(x))`: 0.02278
```

FWIW, I was surprised by this result, so I guess I'm glad I wrote the benchmark!
Pull Request resolved: pytorch#161308
Approved by: https://github.com/Skylion007, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…rch#161315)

Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation.

This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish.
Pull Request resolved: pytorch#161315
Approved by: https://github.com/Skylion007, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…torch#161317)

This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly.
Pull Request resolved: pytorch#161317
Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…h#161432)

If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163)

Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small.
Pull Request resolved: pytorch#161432
Approved by: https://github.com/ezyang
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…rch#161455)

symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast.
Pull Request resolved: pytorch#161455
Approved by: https://github.com/Skylion007
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329, pytorch#161432
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…ing (pytorch#161292)

func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though pytorch#161301 makes accessing the arguments/returns less expensive, this still seems to improve performance.
Pull Request resolved: pytorch#161292
Approved by: https://github.com/wconstab, https://github.com/malfet, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…torch#161304)

`is` checks object identity and is more efficient. Google seems to confirm it is the correct way to do an exact type check.
Pull Request resolved: pytorch#161304
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301, pytorch#161292
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…pytorch#161308)

- Empty containers are Falsey
- Hoist cheap checks first
- Microbenchmarked single-element set access method

Benchmark code:
```
import timeit

to_test = [
    ('list(x)', 'x = set([3])'),
    ('x[0]', 'x = [3]'),
    ('list(x)[0]', 'x = set([3])'),
    ('next(iter(x))', 'x = set([3])'),
]

for (stmt, setup) in to_test:
    res = timeit.timeit(stmt=stmt, setup=setup)
    print(f"Time for `{stmt}`: {res}")
```

Result with Python 3.13 on Mac (with excess digits manually trimmed; directionally matches result on Linux)
```
Time for `list(x)`: 0.03418
Time for `x[0]`: 0.00852
Time for `list(x)[0]`: 0.03561
Time for `next(iter(x))`: 0.02278
```

FWIW, I was surprised by this result, so I guess I'm glad I wrote the benchmark!
Pull Request resolved: pytorch#161308
Approved by: https://github.com/Skylion007, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…rch#161315)

Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation.

This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish.
Pull Request resolved: pytorch#161315
Approved by: https://github.com/Skylion007, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…torch#161317)

This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly.
Pull Request resolved: pytorch#161317
Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…h#161432)

If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163)

Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small.
Pull Request resolved: pytorch#161432
Approved by: https://github.com/ezyang
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…rch#161455)

symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast.
Pull Request resolved: pytorch#161455
Approved by: https://github.com/Skylion007
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329, pytorch#161432
@github-actions github-actions bot deleted the gh/swolchok/795/head branch September 30, 2025 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants