Skip to content

Fix forced copying def_property_readonly for FunctionSchema & friends#161301

Closed
swolchok wants to merge 6 commits intogh/swolchok/797/basefrom
gh/swolchok/797/head
Closed

Fix forced copying def_property_readonly for FunctionSchema & friends#161301
swolchok wants to merge 6 commits intogh/swolchok/797/basefrom
gh/swolchok/797/head

Conversation

@swolchok
Copy link
Contributor

@swolchok swolchok commented Aug 22, 2025

This took me a bit to figure out and I'm pretty sure I've looked at
this code before. Pybind uses
`return_value_policy::reference_internal` for `def_property`, which
[causes the owning object to be kept alive for the lifespan of the
return
value](https://pybind11.readthedocs.io/en/stable/advanced/functions.html),
allowing the getter to safely avoid copying the property
value. However, lambdas act like they return `auto`, not
`decltype(auto)`, so our lambdas themselves were forcing copies!

Testing: observed std::vector<Argument> copying disappear in Linux
perf profile of someOpInfo._schema.arguments/returns (in
_python_dispatch.correct_storage_aliasing).

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161301

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (2 Unrelated Failures)

As of commit 6ecf852 with merge base 05eeb29 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…a & friends"

This took me a bit to figure out and I'm pretty sure I've looked at
this code before. Pybind uses
`return_value_policy::reference_internal` for `def_property`, which
[causes the owning object to be kept alive for the lifespan of the
return
value](https://pybind11.readthedocs.io/en/stable/advanced/functions.html),
allowing the getter to safely avoid copying the property
value. However, lambdas act like they return `auto`, not
`decltype(auto)`, so our lambdas themselves were forcing copies!

Testing: observed std::vector<Argument> copying disappear in Linux
perf profile of someOpInfo._schema.arguments/returns (in
_python_dispatch.correct_storage_aliasing).

cc EikanWang jgong5 wenzhe-nrv sanchitintel

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Aug 22, 2025
…patch.correct_storage_aliasing"


func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though #161301 makes accessing the arguments/returns less expensive, this still seems to improve performance.

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Aug 22, 2025
…orage_aliasing"


func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though #161301 makes accessing the arguments/returns less expensive, this still seems to improve performance.

[ghstack-poisoned]
@swolchok swolchok requested review from XilunWu, ezyang and zpcore August 22, 2025 22:31
.def_property_readonly(
"name", [](const FunctionSchema& self) { return self.name(); })
"name",
[](const FunctionSchema& self) -> decltype(auto) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to add a comment about why the explicit rvalue is important here even if it looks like it's implicitly inferred. An overzealous refactor could easily see these as redundant and remove them. I for sure wouldn't know the difference without an explanation in the comments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also just explicitly specify the return value type as another arg to def can't you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also just explicitly specify the return value type as another arg to def can't you?

I would not be surprised if an attempt to do that ended up trying to take a const reference to a temporary. Being explicit should be simple and effective.

(next ghstack update will add a comment)

…a & friends"

This took me a bit to figure out and I'm pretty sure I've looked at
this code before. Pybind uses
`return_value_policy::reference_internal` for `def_property`, which
[causes the owning object to be kept alive for the lifespan of the
return
value](https://pybind11.readthedocs.io/en/stable/advanced/functions.html),
allowing the getter to safely avoid copying the property
value. However, lambdas act like they return `auto`, not
`decltype(auto)`, so our lambdas themselves were forcing copies!

Testing: observed std::vector<Argument> copying disappear in Linux
perf profile of someOpInfo._schema.arguments/returns (in
_python_dispatch.correct_storage_aliasing).

cc EikanWang jgong5 wenzhe-nrv sanchitintel

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2025
…ing (#161292)

func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though #161301 makes accessing the arguments/returns less expensive, this still seems to improve performance.
Pull Request resolved: #161292
Approved by: https://github.com/wconstab, https://github.com/malfet, https://github.com/bdhirsh
ghstack dependencies: #161301
pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2025
…61304)

`is` checks object identity and is more efficient. Google seems to confirm it is the correct way to do an exact type check.
Pull Request resolved: #161304
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/bdhirsh
ghstack dependencies: #161301, #161292
pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2025
…#161308)

- Empty containers are Falsey
- Hoist cheap checks first
- Microbenchmarked single-element set access method

Benchmark code:
```
import timeit

to_test = [
    ('list(x)', 'x = set([3])'),
    ('x[0]', 'x = [3]'),
    ('list(x)[0]', 'x = set([3])'),
    ('next(iter(x))', 'x = set([3])'),
]

for (stmt, setup) in to_test:
    res = timeit.timeit(stmt=stmt, setup=setup)
    print(f"Time for `{stmt}`: {res}")
```

Result with Python 3.13 on Mac (with excess digits manually trimmed; directionally matches result on Linux)
```
Time for `list(x)`: 0.03418
Time for `x[0]`: 0.00852
Time for `list(x)[0]`: 0.03561
Time for `next(iter(x))`: 0.02278
```

FWIW, I was surprised by this result, so I guess I'm glad I wrote the benchmark!
Pull Request resolved: #161308
Approved by: https://github.com/Skylion007, https://github.com/bdhirsh
ghstack dependencies: #161301, #161292, #161304
pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2025
)

Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation.

This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish.
Pull Request resolved: #161315
Approved by: https://github.com/Skylion007, https://github.com/bdhirsh
ghstack dependencies: #161301, #161292, #161304, #161308
pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2025
…61317)

This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly.
Pull Request resolved: #161317
Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh
ghstack dependencies: #161301, #161292, #161304, #161308, #161315
pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2025
Not a huge cost, but free win is free.
Pull Request resolved: #161328
Approved by: https://github.com/Skylion007
ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317
pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2025
`auto` forces a copy. Confirmed this did something noticable with perf.
Pull Request resolved: #161329
Approved by: https://github.com/zpcore, https://github.com/fduwjj, https://github.com/Skylion007, https://github.com/bdhirsh
ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328
pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2025
If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163)

Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small.
Pull Request resolved: #161432
Approved by: https://github.com/ezyang
ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328, #161329
pytorchmergebot pushed a commit that referenced this pull request Aug 31, 2025
)

symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast.
Pull Request resolved: #161455
Approved by: https://github.com/Skylion007
ghstack dependencies: #161301, #161292, #161304, #161308, #161315, #161317, #161328, #161329, #161432
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…pytorch#161301)

This took me a bit to figure out and I'm pretty sure I've looked at
this code before. Pybind uses
`return_value_policy::reference_internal` for `def_property`, which
[causes the owning object to be kept alive for the lifespan of the
return
value](https://pybind11.readthedocs.io/en/stable/advanced/functions.html),
allowing the getter to safely avoid copying the property
value. However, lambdas act like they return `auto`, not
`decltype(auto)`, so our lambdas themselves were forcing copies!

Testing: observed std::vector<Argument> copying disappear in Linux
perf profile of someOpInfo._schema.arguments/returns (in
_python_dispatch.correct_storage_aliasing).

Pull Request resolved: pytorch#161301
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/wconstab
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…ing (pytorch#161292)

func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though pytorch#161301 makes accessing the arguments/returns less expensive, this still seems to improve performance.
Pull Request resolved: pytorch#161292
Approved by: https://github.com/wconstab, https://github.com/malfet, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…torch#161304)

`is` checks object identity and is more efficient. Google seems to confirm it is the correct way to do an exact type check.
Pull Request resolved: pytorch#161304
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301, pytorch#161292
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…pytorch#161308)

- Empty containers are Falsey
- Hoist cheap checks first
- Microbenchmarked single-element set access method

Benchmark code:
```
import timeit

to_test = [
    ('list(x)', 'x = set([3])'),
    ('x[0]', 'x = [3]'),
    ('list(x)[0]', 'x = set([3])'),
    ('next(iter(x))', 'x = set([3])'),
]

for (stmt, setup) in to_test:
    res = timeit.timeit(stmt=stmt, setup=setup)
    print(f"Time for `{stmt}`: {res}")
```

Result with Python 3.13 on Mac (with excess digits manually trimmed; directionally matches result on Linux)
```
Time for `list(x)`: 0.03418
Time for `x[0]`: 0.00852
Time for `list(x)[0]`: 0.03561
Time for `next(iter(x))`: 0.02278
```

FWIW, I was surprised by this result, so I guess I'm glad I wrote the benchmark!
Pull Request resolved: pytorch#161308
Approved by: https://github.com/Skylion007, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…rch#161315)

Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation.

This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish.
Pull Request resolved: pytorch#161315
Approved by: https://github.com/Skylion007, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…torch#161317)

This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly.
Pull Request resolved: pytorch#161317
Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…h#161432)

If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163)

Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small.
Pull Request resolved: pytorch#161432
Approved by: https://github.com/ezyang
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…rch#161455)

symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast.
Pull Request resolved: pytorch#161455
Approved by: https://github.com/Skylion007
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329, pytorch#161432
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…pytorch#161301)

This took me a bit to figure out and I'm pretty sure I've looked at
this code before. Pybind uses
`return_value_policy::reference_internal` for `def_property`, which
[causes the owning object to be kept alive for the lifespan of the
return
value](https://pybind11.readthedocs.io/en/stable/advanced/functions.html),
allowing the getter to safely avoid copying the property
value. However, lambdas act like they return `auto`, not
`decltype(auto)`, so our lambdas themselves were forcing copies!

Testing: observed std::vector<Argument> copying disappear in Linux
perf profile of someOpInfo._schema.arguments/returns (in
_python_dispatch.correct_storage_aliasing).

Pull Request resolved: pytorch#161301
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/wconstab
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…ing (pytorch#161292)

func._schema is a pybind, accessing the arguments/returns is expensive, we have no reason to do it anyway, and even though pytorch#161301 makes accessing the arguments/returns less expensive, this still seems to improve performance.
Pull Request resolved: pytorch#161292
Approved by: https://github.com/wconstab, https://github.com/malfet, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…torch#161304)

`is` checks object identity and is more efficient. Google seems to confirm it is the correct way to do an exact type check.
Pull Request resolved: pytorch#161304
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301, pytorch#161292
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…pytorch#161308)

- Empty containers are Falsey
- Hoist cheap checks first
- Microbenchmarked single-element set access method

Benchmark code:
```
import timeit

to_test = [
    ('list(x)', 'x = set([3])'),
    ('x[0]', 'x = [3]'),
    ('list(x)[0]', 'x = set([3])'),
    ('next(iter(x))', 'x = set([3])'),
]

for (stmt, setup) in to_test:
    res = timeit.timeit(stmt=stmt, setup=setup)
    print(f"Time for `{stmt}`: {res}")
```

Result with Python 3.13 on Mac (with excess digits manually trimmed; directionally matches result on Linux)
```
Time for `list(x)`: 0.03418
Time for `x[0]`: 0.00852
Time for `list(x)[0]`: 0.03561
Time for `next(iter(x))`: 0.02278
```

FWIW, I was surprised by this result, so I guess I'm glad I wrote the benchmark!
Pull Request resolved: pytorch#161308
Approved by: https://github.com/Skylion007, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…rch#161315)

Scanning a list of pybind enums with `in` is slow. See NOTE in code for full explanation.

This is a significant optimization; will be updating the torchdispatch/return_and_correct_aliasing portion of this stack with benchmark and results soonish.
Pull Request resolved: pytorch#161315
Approved by: https://github.com/Skylion007, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…torch#161317)

This assertion was expensive because of is_traceable_wrapper_subclass. Finding a cheap check to run first that's likely to let us skip the rest seems to improve things significantly.
Pull Request resolved: pytorch#161317
Approved by: https://github.com/ezyang, https://github.com/XilunWu, https://github.com/bdhirsh
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…h#161432)

If we want them interned, we should intern at callsites. (The numpy reference has bit rotted; see numpy/numpy@b222eb6#diff-6bdb6105198083838f51c57b55b3a49472ed23043bb40018f1ea41138e687163)

Profiling a simple torchdispatch benchmark with perf before/after seems to show that time spent copying std::strings and interning Python strings is gone, though there is some noise and the improvement is very small.
Pull Request resolved: pytorch#161432
Approved by: https://github.com/ezyang
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…rch#161455)

symbools are not identical with Py_True or PyFalse, so we can do those cheap checks first and at least get plain old bools to go fast.
Pull Request resolved: pytorch#161455
Approved by: https://github.com/Skylion007
ghstack dependencies: pytorch#161301, pytorch#161292, pytorch#161304, pytorch#161308, pytorch#161315, pytorch#161317, pytorch#161328, pytorch#161329, pytorch#161432
@github-actions github-actions bot deleted the gh/swolchok/797/head branch September 30, 2025 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: jit Add this issue/PR to JIT oncall triage queue release notes: jit release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants