Add C++ function for torch.distributed.tensor._op_schema.is_view_op by swolchok · Pull Request #161595 · pytorch/pytorch

swolchok · 2025-08-27T05:51:47Z

Stack from ghstack (oldest at bottom):

This seems to have been an especially slow one because of the repeated pybind access (schema is a pybind, as is arguments, and then we hit each argument). It's still ~~1% of total benchmark runtime because of the repeated single pybind function call, but that's a lot better.

Differential Revision: D81530095

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim @EikanWang @jgong5 @wenzhe-nrv @sanchitintel

This seems to have been an especially slow one because of the repeated pybind access (schema is a pybind, as is arguments, and then we hit each argument). It's still ~~1% of total benchmark runtime because of the repeated single pybind function call, but that's a lot better. [ghstack-poisoned]

pytorch-bot · 2025-08-27T05:51:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161595

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 90190d0 with merge base dcf3853 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / unit-test / inductor-halide-build / build (gh) (trunk failure)
undefined reference to NVPW_InitializeHost'`
inductor / unit-test / inductor-triton-cpu-build / build (gh) (trunk failure)
undefined reference to NVPW_InitializeHost'`

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…orch#162218) It returns a const reference to a vector. Pull Request resolved: pytorch#162218 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692, pytorch#162219, pytorch#162220

These seem to have been costing us 5-10 usec per detach (out of ~~95 usec total). If they need to ship let's talk about requirements and how we can make this more efficient given that we would prefer if an entire DTensor op could finish in 10 usec. Differential Revision: [D81530106](https://our.internmc.facebook.com/intern/diff/D81530106) Pull Request resolved: pytorch#161596 Approved by: https://github.com/ezyang, https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692, pytorch#162219, pytorch#162220, pytorch#162218

…162337) We control DTensor, so we can just guarantee there isn't a programming error with __torch_dispatch__. (The guard is already less-than-perfect; see the note that the deleted comment refers to.) Pull Request resolved: pytorch#162337 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692, pytorch#162219, pytorch#162220, pytorch#162218, pytorch#161596

…ytorch#161595) This seems to have been an especially slow one because of the repeated pybind access (schema is a pybind, as is arguments, and then we hit each argument). It's still ~~1% of total benchmark runtime because of the repeated single pybind function call, but that's a lot better. Differential Revision: [D81530095](https://our.internmc.facebook.com/intern/diff/D81530095) Pull Request resolved: pytorch#161595 Approved by: https://github.com/ezyang, https://github.com/bdhirsh ghstack dependencies: pytorch#161466, pytorch#161586, pytorch#161590, pytorch#161591

…ytorch#161633) py::args::size() calls PyTuple_GetSize. Compiler can't know the two calls will always return the same result, so we have to consolidate them ourselves. Differential Revision: [D81530096](https://our.internmc.facebook.com/intern/diff/D81530096) Pull Request resolved: pytorch#161633 Approved by: https://github.com/ezyang, https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595

It is cheap to do an exact check against Tensor and much faster when it works (PyType_IsSubtype does not have this fastpath, I checked [source](https://github.com/python/cpython/blob/9ee0214b5dd982ac9fbe18dcce0e8787456e29af/Objects/typeobject.c#L2889)). Spot-checked in perf on detach-DTensor-in-a-loop benchmark; small win but clear. Differential Revision: [D81530101](https://our.internmc.facebook.com/intern/diff/D81530101) Pull Request resolved: pytorch#161634 Approved by: https://github.com/Skylion007, https://github.com/albanD ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633

Calling this first minimizes overhead for plain old ints, making cheap things cheap. Differential Revision: [D81530098](https://our.internmc.facebook.com/intern/diff/D81530098) Pull Request resolved: pytorch#161692 Approved by: https://github.com/ezyang, https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634

…rayRef (pytorch#162219) Avoids requiring vector allocation to call this. Pull Request resolved: pytorch#162219 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692

Optimize for common case and remove a pair of refcount operations (see new comments.) Pull Request resolved: pytorch#162220 Approved by: https://github.com/jansel, https://github.com/williamwen42 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692, pytorch#162219

…orch#162218) It returns a const reference to a vector. Pull Request resolved: pytorch#162218 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692, pytorch#162219, pytorch#162220

These seem to have been costing us 5-10 usec per detach (out of ~~95 usec total). If they need to ship let's talk about requirements and how we can make this more efficient given that we would prefer if an entire DTensor op could finish in 10 usec. Differential Revision: [D81530106](https://our.internmc.facebook.com/intern/diff/D81530106) Pull Request resolved: pytorch#161596 Approved by: https://github.com/ezyang, https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692, pytorch#162219, pytorch#162220, pytorch#162218

…162337) We control DTensor, so we can just guarantee there isn't a programming error with __torch_dispatch__. (The guard is already less-than-perfect; see the note that the deleted comment refers to.) Pull Request resolved: pytorch#162337 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692, pytorch#162219, pytorch#162220, pytorch#162218, pytorch#161596

…ytorch#161595) This seems to have been an especially slow one because of the repeated pybind access (schema is a pybind, as is arguments, and then we hit each argument). It's still ~~1% of total benchmark runtime because of the repeated single pybind function call, but that's a lot better. Differential Revision: [D81530095](https://our.internmc.facebook.com/intern/diff/D81530095) Pull Request resolved: pytorch#161595 Approved by: https://github.com/ezyang, https://github.com/bdhirsh ghstack dependencies: pytorch#161466, pytorch#161586, pytorch#161590, pytorch#161591

…ytorch#161633) py::args::size() calls PyTuple_GetSize. Compiler can't know the two calls will always return the same result, so we have to consolidate them ourselves. Differential Revision: [D81530096](https://our.internmc.facebook.com/intern/diff/D81530096) Pull Request resolved: pytorch#161633 Approved by: https://github.com/ezyang, https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595

It is cheap to do an exact check against Tensor and much faster when it works (PyType_IsSubtype does not have this fastpath, I checked [source](https://github.com/python/cpython/blob/9ee0214b5dd982ac9fbe18dcce0e8787456e29af/Objects/typeobject.c#L2889)). Spot-checked in perf on detach-DTensor-in-a-loop benchmark; small win but clear. Differential Revision: [D81530101](https://our.internmc.facebook.com/intern/diff/D81530101) Pull Request resolved: pytorch#161634 Approved by: https://github.com/Skylion007, https://github.com/albanD ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633

Calling this first minimizes overhead for plain old ints, making cheap things cheap. Differential Revision: [D81530098](https://our.internmc.facebook.com/intern/diff/D81530098) Pull Request resolved: pytorch#161692 Approved by: https://github.com/ezyang, https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634

…rayRef (pytorch#162219) Avoids requiring vector allocation to call this. Pull Request resolved: pytorch#162219 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692

Optimize for common case and remove a pair of refcount operations (see new comments.) Pull Request resolved: pytorch#162220 Approved by: https://github.com/jansel, https://github.com/williamwen42 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692, pytorch#162219

…orch#162218) It returns a const reference to a vector. Pull Request resolved: pytorch#162218 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692, pytorch#162219, pytorch#162220

These seem to have been costing us 5-10 usec per detach (out of ~~95 usec total). If they need to ship let's talk about requirements and how we can make this more efficient given that we would prefer if an entire DTensor op could finish in 10 usec. Differential Revision: [D81530106](https://our.internmc.facebook.com/intern/diff/D81530106) Pull Request resolved: pytorch#161596 Approved by: https://github.com/ezyang, https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692, pytorch#162219, pytorch#162220, pytorch#162218

…162337) We control DTensor, so we can just guarantee there isn't a programming error with __torch_dispatch__. (The guard is already less-than-perfect; see the note that the deleted comment refers to.) Pull Request resolved: pytorch#162337 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692, pytorch#162219, pytorch#162220, pytorch#162218, pytorch#161596

…ytorch#161595) This seems to have been an especially slow one because of the repeated pybind access (schema is a pybind, as is arguments, and then we hit each argument). It's still ~~1% of total benchmark runtime because of the repeated single pybind function call, but that's a lot better. Differential Revision: [D81530095](https://our.internmc.facebook.com/intern/diff/D81530095) Pull Request resolved: pytorch#161595 Approved by: https://github.com/ezyang, https://github.com/bdhirsh ghstack dependencies: pytorch#161466, pytorch#161586, pytorch#161590, pytorch#161591

…ytorch#161633) py::args::size() calls PyTuple_GetSize. Compiler can't know the two calls will always return the same result, so we have to consolidate them ourselves. Differential Revision: [D81530096](https://our.internmc.facebook.com/intern/diff/D81530096) Pull Request resolved: pytorch#161633 Approved by: https://github.com/ezyang, https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595

It is cheap to do an exact check against Tensor and much faster when it works (PyType_IsSubtype does not have this fastpath, I checked [source](https://github.com/python/cpython/blob/9ee0214b5dd982ac9fbe18dcce0e8787456e29af/Objects/typeobject.c#L2889)). Spot-checked in perf on detach-DTensor-in-a-loop benchmark; small win but clear. Differential Revision: [D81530101](https://our.internmc.facebook.com/intern/diff/D81530101) Pull Request resolved: pytorch#161634 Approved by: https://github.com/Skylion007, https://github.com/albanD ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633

Calling this first minimizes overhead for plain old ints, making cheap things cheap. Differential Revision: [D81530098](https://our.internmc.facebook.com/intern/diff/D81530098) Pull Request resolved: pytorch#161692 Approved by: https://github.com/ezyang, https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634

…rayRef (pytorch#162219) Avoids requiring vector allocation to call this. Pull Request resolved: pytorch#162219 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692

Optimize for common case and remove a pair of refcount operations (see new comments.) Pull Request resolved: pytorch#162220 Approved by: https://github.com/jansel, https://github.com/williamwen42 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692, pytorch#162219

…orch#162218) It returns a const reference to a vector. Pull Request resolved: pytorch#162218 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692, pytorch#162219, pytorch#162220

These seem to have been costing us 5-10 usec per detach (out of ~~95 usec total). If they need to ship let's talk about requirements and how we can make this more efficient given that we would prefer if an entire DTensor op could finish in 10 usec. Differential Revision: [D81530106](https://our.internmc.facebook.com/intern/diff/D81530106) Pull Request resolved: pytorch#161596 Approved by: https://github.com/ezyang, https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692, pytorch#162219, pytorch#162220, pytorch#162218

…162337) We control DTensor, so we can just guarantee there isn't a programming error with __torch_dispatch__. (The guard is already less-than-perfect; see the note that the deleted comment refers to.) Pull Request resolved: pytorch#162337 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#161591, pytorch#161595, pytorch#161633, pytorch#161634, pytorch#161692, pytorch#162219, pytorch#162220, pytorch#162218, pytorch#161596

swolchok requested a review from mikaylagawarecki as a code owner August 27, 2025 05:51

pytorch-bot Bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: jit release notes category labels Aug 27, 2025

This was referenced Aug 27, 2025

Fix non-const reference arguments in torch/csrc/jit/python/init.cpp #161300

Closed

Fix forced copying def_property_readonly for FunctionSchema & friends #161301

Closed

Stop accessing func._schema in _python_dispatch.correct_storage_aliasing #161292

Closed

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Aug 27, 2025

swolchok requested review from XilunWu, ezyang, wconstab and zpcore August 27, 2025 05:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add C++ function for torch.distributed.tensor._op_schema.is_view_op#161595

Add C++ function for torch.distributed.tensor._op_schema.is_view_op#161595
swolchok wants to merge 6 commits intogh/swolchok/814/basefrom
gh/swolchok/814/head

swolchok commented Aug 27, 2025 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented Aug 27, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

swolchok commented Aug 27, 2025 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161595

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

swolchok commented Aug 27, 2025 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Aug 27, 2025 •

edited

Loading