Port OpSchema.__post_init__ and OpSchema._recompute_comparison_key to C++#161695
Port OpSchema.__post_init__ and OpSchema._recompute_comparison_key to C++#161695swolchok wants to merge 14 commits intogh/swolchok/823/basefrom
Conversation
… does make it faster I initially didn't see good results porting this, but it was apparently because of pybind11 function calling overhead. (pybind11's object-handling primitives seem fine enough.) I'm interested in setting up nanobind, but this demonstrates it's not blocking. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161695
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 0f67393 with merge base a63221a ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
… does make it faster I initially didn't see good results porting this, but it was apparently because of pybind11 function calling overhead. (pybind11's object-handling primitives seem fine enough.) I'm interested in setting up nanobind, but this demonstrates it's not blocking. ghstack-source-id: cfd236c Pull Request resolved: #161695
| #endif | ||
| } | ||
|
|
||
| static bool arg_type_tensor_or_tensor_list_like(py::handle arg) { |
There was a problem hiding this comment.
Do you have objections to keeping the old Python implementation around in comments as a reference, or do you think it will bitrot immediately?
There was a problem hiding this comment.
commented-out code generally bitrots. if we want it around we would need to make it modally enabled and add tests
|
How good do you think the preexisting testing here is? We are looking at some (orthogonal) parts of DTensor where the testing is definitely NOT enough and I'm wondering if this applies here or not. |
| c10::SmallVector<py::object, 8> args_to_hash; | ||
| size_t idx = 0; | ||
| for (const auto& e : args_schema) { | ||
| if (idx >= static_argnum || arg_type_tensor_or_tensor_list_like(e)) { |
| int idx = 0; | ||
| auto kwargs_schema = py::reinterpret_borrow<py::dict>(raw_kwargs_schema); | ||
| for (const auto& k : static_kwargkey_list) { | ||
| PyObject* item = PyDict_GetItem(kwargs_schema.ptr(), k.ptr()); |
There was a problem hiding this comment.
safer to call PyDict_GetItemWithError
There was a problem hiding this comment.
oh wow I didn't fully realize what was going on with PyDict_GetItem
| kwargs_to_hash[idx++] = py::none(); | ||
| } | ||
| } | ||
| PyObject* comparison_key = PyTuple_Pack( |
There was a problem hiding this comment.
Should test that PyTuple_Pack succeeded, and ditto below
| if (!PyList_Check(static_kwargkey.ptr())) { | ||
| PyErr_SetString( | ||
| PyExc_TypeError, "self.schema_info.static_kwargkey must be a list!"); | ||
| return false; |
There was a problem hiding this comment.
It might be more uniform with the rest of the codebase if we just raised exceptions when this happens, but I guess this is not too difficult to get right.
| const auto dtensor_spec_class = get_dtensor_spec_class(); | ||
| bool has_symints = false; | ||
| for (const auto& a : args_schema) { | ||
| if (Py_TYPE(a.ptr()) != (PyTypeObject*)(dtensor_spec_class.ptr()) && |
There was a problem hiding this comment.
direct type test first, interesting that this helps!
There was a problem hiding this comment.
yeah I briefly considered upstreaming extensions to PyFloat_Check etc. to first do the exact checks because they're way cheaper. the CPython API functions can't be inlined so there's function call overhead and they also don't themselves start with the direct test.
ezyang
left a comment
There was a problem hiding this comment.
I reviewed logic equivalence and CPython API usage, I did NOT review memory management
I was feeling mildly guilty for not adding focused tests here yesterday. Can just go ahead and do it if I can't find OpSchema-specific tests. I have a historical tendency of incorrectly assuming that PyTorch code is automatically well-tested for some reason. |
…mWithError, address other review feedback on "Port OpSchema.__post_init__ and OpSchema._recompute_comparison_key to C++" I initially didn't see good results porting this, but it was apparently because of pybind11 function calling overhead. (pybind11's object-handling primitives seem fine enough.) I'm interested in setting up nanobind, but this demonstrates it's not blocking. Differential Revision: [D81530102](https://our.internmc.facebook.com/intern/diff/D81530102) cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]
… does make it faster I initially didn't see good results porting this, but it was apparently because of pybind11 function calling overhead. (pybind11's object-handling primitives seem fine enough.) I'm interested in setting up nanobind, but this demonstrates it's not blocking. ghstack-source-id: 829bad9 Pull Request resolved: #161695
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Move the entirety of `__new__` into C++, saving a layer of disable_dynamo and making progress toward all-C++. Pull Request resolved: #162508 Approved by: https://github.com/ezyang ghstack dependencies: #161695
… C++ (pytorch#161695) I initially didn't see good results porting this, but it was apparently because of pybind11 function calling overhead. (pybind11's object-handling primitives seem fine enough.) I'm interested in setting up nanobind, but this demonstrates it's not blocking. Differential Revision: [D81530102](https://our.internmc.facebook.com/intern/diff/D81530102) Pull Request resolved: pytorch#161695 Approved by: https://github.com/ezyang
Move the entirety of `__new__` into C++, saving a layer of disable_dynamo and making progress toward all-C++. Pull Request resolved: pytorch#162508 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161695
… C++ (pytorch#161695) I initially didn't see good results porting this, but it was apparently because of pybind11 function calling overhead. (pybind11's object-handling primitives seem fine enough.) I'm interested in setting up nanobind, but this demonstrates it's not blocking. Differential Revision: [D81530102](https://our.internmc.facebook.com/intern/diff/D81530102) Pull Request resolved: pytorch#161695 Approved by: https://github.com/ezyang
Move the entirety of `__new__` into C++, saving a layer of disable_dynamo and making progress toward all-C++. Pull Request resolved: pytorch#162508 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161695
…nsor_info (#162968) Next PR writes a C++ implementation. Seems good to have tests first. Pull Request resolved: #162968 Approved by: https://github.com/ezyang ghstack dependencies: #161695, #162508
… C++ (pytorch#161695) I initially didn't see good results porting this, but it was apparently because of pybind11 function calling overhead. (pybind11's object-handling primitives seem fine enough.) I'm interested in setting up nanobind, but this demonstrates it's not blocking. Differential Revision: [D81530102](https://our.internmc.facebook.com/intern/diff/D81530102) Pull Request resolved: pytorch#161695 Approved by: https://github.com/ezyang
Move the entirety of `__new__` into C++, saving a layer of disable_dynamo and making progress toward all-C++. Pull Request resolved: pytorch#162508 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161695
…nsor_info (pytorch#162968) Next PR writes a C++ implementation. Seems good to have tests first. Pull Request resolved: pytorch#162968 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#161695, pytorch#162508
Stack from ghstack (oldest at bottom):
I initially didn't see good results porting this, but it was apparently because of pybind11 function calling overhead. (pybind11's object-handling primitives seem fine enough.) I'm interested in setting up nanobind, but this demonstrates it's not blocking.
Differential Revision: D81530102
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim @dcci