gh-131525: Cache the result of tuple_hash#131529
Conversation
Objects/tupleobject.c
Outdated
| if (np == NULL) | ||
| return NULL; | ||
|
|
||
| _PyTuple_RESET_HASH_CACHE(np); |
There was a problem hiding this comment.
is there any reason not to just move this into tuple_alloc() itself instead of repeating it everywhere tuple_alloc() is called?
There was a problem hiding this comment.
I don't personally have an objection. Most *_alloc functions only do allocation, not initialization, so I didn't want to confuse that. But the cached hash is kind of a special case...
Objects/tupleobject.c
Outdated
| PyTupleObject *v = _PyTuple_CAST(op); | ||
|
|
||
| // For the empty singleton, we don't need to dereference the pointer | ||
| if (op == (PyObject *)&_Py_SINGLETON(tuple_empty)) { |
There was a problem hiding this comment.
You could set the hash of () statically in pycore_runtime_init.h, then this test would be unnecessary.
There was a problem hiding this comment.
Yes, but interestingly, this is measurably faster -- it doesn't have to chase the pointer.
There was a problem hiding this comment.
Seems like the difference here is in the noise, so might as well do this at build time and skip this extra branch.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Objects/tupleobject.c
Outdated
| { | ||
| PyTupleObject *v = _PyTuple_CAST(op); | ||
|
|
||
| if (v->ob_hash != -1) { |
There was a problem hiding this comment.
The loads of ob_hash need relaxed memory ordering i.e. use FT_ATOMIC_LOAD_SSIZE_RELAXED
There was a problem hiding this comment.
Still learning the free-threaded conventions. Can you explain why (or point me at an explanation)? Since the write is atomic and we don't care if there is a race (worst case, the hash gets recomputed), and the thread sanitizer CI isn't complaining, I don't see why this would be necessary -- but also I'm new to this stuff so maybe I'm missing something.
There was a problem hiding this comment.
The FT_... macros translate to plain loads/stores on the default build, so there is no overhead there.
It is necessary because accessing a field from multiple threads without any kind of synchronization is undefined behavior.
Even on the free-threaded build the RELAXED part means that they is no synchronization at the hardware level on most platforms, it just convinces the compiler and sanitizers that this is OK.
|
When you're done making the requested changes, leave the comment: |
|
I have made the requested changes; please review again |
|
Thanks for making the requested changes! @picnixz, @kumaraditya303: please review the changes made to this pull request. |
We should probably just remove the old comments and keep one in |
|
This PR changes 102 files, it's hard to review. Most files are generated by Argument Clinic. Would it be possible to have a first PR which only adds the |
GitHub seems to be doing the right thing by hiding the generated changes. Is that not happening for you? I'd be happy to rearrange the commits, but I don't think merging a PR if the second one may not be approved makes sense. Let me know, as that would also invalidate the existing review comments on this PR. |
Co-authored-by: Chris Eibl <138194463+chris-eibl@users.noreply.github.com>
* pythongh-131525: Cache the result of tuple_hash * Fix debug builds * Add blurb * Fix formatting * Pre-compute empty tuple singleton * Mostly set the cache within tuple_alloc * Fixes for TSAN * Pre-compute empty tuple singleton * Fix for 32-bit platforms * Assert that op != NULL in _PyTuple_RESET_HASH_CACHE * Use FT_ATOMIC_STORE_SSIZE_RELAXED macro * Update Include/internal/pycore_tuple.h Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> * Fix alignment * atomic load * Update Objects/tupleobject.c Co-authored-by: Chris Eibl <138194463+chris-eibl@users.noreply.github.com> --------- Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Chris Eibl <138194463+chris-eibl@users.noreply.github.com>
Summary: Use `offsetof()` to find the offset of the `PyTupleObject`'s `ob_item` member, since upstream CPython could [add more fields](python/cpython#131529) in the future. Reviewed By: prakashgayasen, islamismailov Differential Revision: D83114258 fbshipit-source-id: f30225b70d07638c5092e2e022d11eb278a7c3ee
Summary: Use `offsetof()` to find the offset of the `PyTupleObject`'s `ob_item` member, since upstream CPython could [add more fields](python/cpython#131529) in the future. Reviewed By: prakashgayasen, islamismailov Differential Revision: D83114258 fbshipit-source-id: f30225b70d07638c5092e2e022d11eb278a7c3ee
Back in 2013, it was determined that caching the result of tuple_hash did not have any significant speedup.
However, a lot has changed since then, and in a recent experiment to add a tuple hash cache back in, the mdp benchmark increased by 86%.
Admittedly, there was no measurable improvement on any other benchmark, but it also seems to have no downside, including for memory usage when measured with max_rss.