gh-133703: dict: fix calculate_log2_keysize()#133809
Conversation
|
Good PR. Since this code is a hotspot where every instruction matters, I found a way to shave off a few more instructions. This doesn't have to block the PR, but it's something we could consider in a follow-up optimization.
Look at the GCC/Clang path: cpython/Include/internal/pycore_bitutils.h Lines 146 to 157 in 9546eee We don't need to check cpython/Include/internal/pycore_bitutils.h Lines 157 to 167 in 9546eee If the tradeoff is worth it, we can create a |
|
_Py_bit_length is inline function. compiler optimize away unnecessary 0 comparison. |
|
I had suggested avoiding Py_MAX because it can not be eliminated statically. |
|
Thanks @methane for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14. |
(cherry picked from commit 92337f6) Co-authored-by: Inada Naoki <songofacandy@gmail.com>
|
Sorry, @methane, I could not cleanly backport this to |
|
GH-133861 is a backport of this pull request to the 3.14 branch. |
|
GH-133862 is a backport of this pull request to the 3.13 branch. |
|
@methane Sorry for misunderstanding, I just couldn't find the comment anymore where you said that. I think it's gone.
You're right about most modern compilers, but not all of them. I tested the no-zero-check concept on MSVC x86, a Tier 1 platform, and it does actually shave a few assembly instructions. Notice the conditional je is present after the bsr: But not here: You can see it for yourself here. Yeah, it's "only" a je + unused branch, but it's unexpectedly running on every |
We don't use |
Sorry, I was wrong. We use _Py_bit_length on x86-32. |
|
That's understandable. |
calculate_log2_keysizeindictobject.cincorrect #133703