Skip to content

Port OpSchema.__post_init__ and OpSchema._recompute_comparison_key to C++#161695

Closed
swolchok wants to merge 14 commits intogh/swolchok/823/basefrom
gh/swolchok/823/head
Closed

Port OpSchema.__post_init__ and OpSchema._recompute_comparison_key to C++#161695
swolchok wants to merge 14 commits intogh/swolchok/823/basefrom
gh/swolchok/823/head

Conversation

@swolchok
Copy link
Contributor

@swolchok swolchok commented Aug 28, 2025

Stack from ghstack (oldest at bottom):

I initially didn't see good results porting this, but it was apparently because of pybind11 function calling overhead. (pybind11's object-handling primitives seem fine enough.) I'm interested in setting up nanobind, but this demonstrates it's not blocking.

Differential Revision: D81530102

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim @dcci

… does make it faster

I initially didn't see good results porting this, but it was apparently because of pybind11 function calling overhead. (pybind11's object-handling primitives seem fine enough.) I'm interested in setting up nanobind, but this demonstrates it's not blocking.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 28, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161695

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0f67393 with merge base a63221a (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Aug 28, 2025
This was referenced Aug 28, 2025
swolchok added a commit that referenced this pull request Sep 17, 2025
… does make it faster

I initially didn't see good results porting this, but it was apparently because of pybind11 function calling overhead. (pybind11's object-handling primitives seem fine enough.) I'm interested in setting up nanobind, but this demonstrates it's not blocking.

ghstack-source-id: cfd236c
Pull Request resolved: #161695
#endif
}

static bool arg_type_tensor_or_tensor_list_like(py::handle arg) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have objections to keeping the old Python implementation around in comments as a reference, or do you think it will bitrot immediately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented-out code generally bitrots. if we want it around we would need to make it modally enabled and add tests

@ezyang
Copy link
Contributor

ezyang commented Sep 18, 2025

How good do you think the preexisting testing here is? We are looking at some (orthogonal) parts of DTensor where the testing is definitely NOT enough and I'm wondering if this applies here or not.

c10::SmallVector<py::object, 8> args_to_hash;
size_t idx = 0;
for (const auto& e : args_schema) {
if (idx >= static_argnum || arg_type_tensor_or_tensor_list_like(e)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reordered cond, nice

int idx = 0;
auto kwargs_schema = py::reinterpret_borrow<py::dict>(raw_kwargs_schema);
for (const auto& k : static_kwargkey_list) {
PyObject* item = PyDict_GetItem(kwargs_schema.ptr(), k.ptr());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safer to call PyDict_GetItemWithError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wow I didn't fully realize what was going on with PyDict_GetItem

kwargs_to_hash[idx++] = py::none();
}
}
PyObject* comparison_key = PyTuple_Pack(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should test that PyTuple_Pack succeeded, and ditto below

if (!PyList_Check(static_kwargkey.ptr())) {
PyErr_SetString(
PyExc_TypeError, "self.schema_info.static_kwargkey must be a list!");
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more uniform with the rest of the codebase if we just raised exceptions when this happens, but I guess this is not too difficult to get right.

const auto dtensor_spec_class = get_dtensor_spec_class();
bool has_symints = false;
for (const auto& a : args_schema) {
if (Py_TYPE(a.ptr()) != (PyTypeObject*)(dtensor_spec_class.ptr()) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

direct type test first, interesting that this helps!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I briefly considered upstreaming extensions to PyFloat_Check etc. to first do the exact checks because they're way cheaper. the CPython API functions can't be inlined so there's function call overhead and they also don't themselves start with the direct test.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed logic equivalence and CPython API usage, I did NOT review memory management

@swolchok
Copy link
Contributor Author

How good do you think the preexisting testing here is? We are looking at some (orthogonal) parts of DTensor where the testing is definitely NOT enough and I'm wondering if this applies here or not.

I was feeling mildly guilty for not adding focused tests here yesterday. Can just go ahead and do it if I can't find OpSchema-specific tests. I have a historical tendency of incorrectly assuming that PyTorch code is automatically well-tested for some reason.

…mWithError, address other review feedback on "Port OpSchema.__post_init__ and OpSchema._recompute_comparison_key to C++"

I initially didn't see good results porting this, but it was apparently because of pybind11 function calling overhead. (pybind11's object-handling primitives seem fine enough.) I'm interested in setting up nanobind, but this demonstrates it's not blocking.

Differential Revision: [D81530102](https://our.internmc.facebook.com/intern/diff/D81530102)

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta ezyang msaroufim dcci

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Sep 18, 2025
… does make it faster

I initially didn't see good results porting this, but it was apparently because of pybind11 function calling overhead. (pybind11's object-handling primitives seem fine enough.) I'm interested in setting up nanobind, but this demonstrates it's not blocking.

ghstack-source-id: 829bad9
Pull Request resolved: #161695
@swolchok
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Sep 21, 2025
Move the entirety of `__new__` into C++, saving a layer of disable_dynamo and making progress toward all-C++.

Pull Request resolved: #162508
Approved by: https://github.com/ezyang
ghstack dependencies: #161695
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
… C++ (pytorch#161695)

I initially didn't see good results porting this, but it was apparently because of pybind11 function calling overhead. (pybind11's object-handling primitives seem fine enough.) I'm interested in setting up nanobind, but this demonstrates it's not blocking.

Differential Revision: [D81530102](https://our.internmc.facebook.com/intern/diff/D81530102)

Pull Request resolved: pytorch#161695
Approved by: https://github.com/ezyang
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
Move the entirety of `__new__` into C++, saving a layer of disable_dynamo and making progress toward all-C++.

Pull Request resolved: pytorch#162508
Approved by: https://github.com/ezyang
ghstack dependencies: pytorch#161695
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
… C++ (pytorch#161695)

I initially didn't see good results porting this, but it was apparently because of pybind11 function calling overhead. (pybind11's object-handling primitives seem fine enough.) I'm interested in setting up nanobind, but this demonstrates it's not blocking.

Differential Revision: [D81530102](https://our.internmc.facebook.com/intern/diff/D81530102)

Pull Request resolved: pytorch#161695
Approved by: https://github.com/ezyang
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
Move the entirety of `__new__` into C++, saving a layer of disable_dynamo and making progress toward all-C++.

Pull Request resolved: pytorch#162508
Approved by: https://github.com/ezyang
ghstack dependencies: pytorch#161695
pytorchmergebot pushed a commit that referenced this pull request Sep 23, 2025
…nsor_info (#162968)

Next PR writes a C++ implementation. Seems good to have tests first.

Pull Request resolved: #162968
Approved by: https://github.com/ezyang
ghstack dependencies: #161695, #162508
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
… C++ (pytorch#161695)

I initially didn't see good results porting this, but it was apparently because of pybind11 function calling overhead. (pybind11's object-handling primitives seem fine enough.) I'm interested in setting up nanobind, but this demonstrates it's not blocking.

Differential Revision: [D81530102](https://our.internmc.facebook.com/intern/diff/D81530102)

Pull Request resolved: pytorch#161695
Approved by: https://github.com/ezyang
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
Move the entirety of `__new__` into C++, saving a layer of disable_dynamo and making progress toward all-C++.

Pull Request resolved: pytorch#162508
Approved by: https://github.com/ezyang
ghstack dependencies: pytorch#161695
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…nsor_info (pytorch#162968)

Next PR writes a C++ implementation. Seems good to have tests first.

Pull Request resolved: pytorch#162968
Approved by: https://github.com/ezyang
ghstack dependencies: pytorch#161695, pytorch#162508
@github-actions github-actions bot deleted the gh/swolchok/823/head branch October 20, 2025 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (dtensor) release notes category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants