gh-140149: use PyBytesWriter in action_helpers.c's _build_concatenated_bytes; 3x faster bytes concat in the parser#140150
Conversation
PyBytesWriter in action_helpers.c's _build_concatenated_bytes; 3x speed up for bytes concat in the parserPyBytesWriter in action_helpers.c's _build_concatenated_bytes; 3x faster bytes concat in the parser
Parser/action_helpers.c
Outdated
| PyBytes_Concat(&res, elem->v.Constant.value); | ||
| Py_ssize_t part = PyBytes_GET_SIZE(elem->v.Constant.value); | ||
| if (part > 0) { | ||
| memcpy(out, PyBytes_AS_STRING(elem->v.Constant.value), part); |
There was a problem hiding this comment.
Why not using PyBytesWriter_WriteBytes here?
There was a problem hiding this comment.
PyBytesWriter_WriteBytes() grows the buffer if needed. It's not needed since the code already computes the total size in advance.
Parser/action_helpers.c
Outdated
| PyBytes_Concat(&res, elem->v.Constant.value); | ||
| Py_ssize_t part = PyBytes_GET_SIZE(elem->v.Constant.value); | ||
| if (part > 0) { | ||
| memcpy(out, PyBytes_AS_STRING(elem->v.Constant.value), part); |
There was a problem hiding this comment.
PyBytesWriter_WriteBytes() grows the buffer if needed. It's not needed since the code already computes the total size in advance.
This optimization is good to have, but I don't think that users will notice since the parser is only run once at Python startup. I don't think that it's worth it to document this optimization. |
I concur, but on the other hand it doesn't hurt |
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
Misc/NEWS.d/next/Core_and_Builtins/2025-10-15-17-12-32.gh-issue-140149.cy1m3d.rst
Outdated
Show resolved
Hide resolved
…e-140149.cy1m3d.rst
| PyObject* kind = asdl_seq_GET(strings, 0)->v.Constant.kind; | ||
| PyObject *kind = asdl_seq_GET(strings, 0)->v.Constant.kind; | ||
|
|
||
| Py_ssize_t total = 0; |
There was a problem hiding this comment.
bit of a meta question: How much performance change is it to not pre-calculate the length? The precalculation + memcpy makes this code quite a bit more complex. If it's only a couple percentage difference (so most the 3x is kept) the simpler code for some of these would be nice
There was a problem hiding this comment.
If you use PyBytesWriter_Create(0), you have to resize the buffer multiple times, it's less efficient. I don't know how much. But here, the output size is easy to compute so I think that it's worth it to precompute the output size.
|
Merged, thanks for this nice optimization! |
…thon#140150) Use PyBytesWriter in action_helpers.c _build_concatenated_bytes(). 3x faster bytes concat in the parser. Co-authored-by: Victor Stinner <vstinner@python.org>
The issue gh-140149 provides more details.
This effectively makes
bytesconcatenation about 3x faster in the parser, syntax like:Benchmark
The script:
The results (with
--rigorous, on 9955759):The environment:
sudo ./python -m pyperf system tuneensured.