Stack allocate small strings#270
Conversation
'error' was never a valid Python string error handler. Accept it for backwards compatibility but normalize to 'strict' internally. Update error messages to only mention valid options.
The C implementation of decode_definite_short_string was not respecting the str_errors setting - it always used PyUnicode_FromStringAndSize which only supports strict mode. Now uses PyUnicode_DecodeUTF8 with the str_errors field passed directly. Store str_errors as const char* (NULL for strict, "replace" for replace) instead of PyObject*. This eliminates a conditional in the hot path since PyUnicode_DecodeUTF8 accepts NULL to mean strict mode. Fixes agronholm#255
Skip creating an intermediate Python bytes object by using fp_read directly into a PyMem_Malloc buffer. This avoids Python allocator and reference counting overhead.
Use a stack buffer for strings <= 256 bytes to avoid heap allocation overhead. Larger strings continue to use PyMem_Malloc.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
I found a test which fails, even on this PR:
def test_str_errors_long_string_over_65536_bytes(impl):
"""Issue #255: str_errors not respected for strings >65536 bytes."""
# 65537 bytes: 65536 'a' + 1 invalid UTF-8 byte
payload = unhexlify("7a00010001" + "61" * 65536 + "c3")
result = impl.loads(payload, str_errors="replace")
assert len(result) == 65537
assert result[-1] == "\ufffd"
Claude came up with this fix. I'm not going to stand by it, but it does make the test pass:
diff --git a/source/decoder.c b/source/decoder.c
index 3751e7a..8518a54 100644
--- a/source/decoder.c
+++ b/source/decoder.c
@@ -905,7 +905,7 @@ decode_definite_long_string(CBORDecoderObject *self, Py_ssize_t length)
}
consumed = chunk_length; // workaround for https://github.com/python/cpython/issues/99612
- string = PyUnicode_DecodeUTF8Stateful(source_buffer, chunk_length, NULL, &consumed);
+ string = PyUnicode_DecodeUTF8Stateful(source_buffer, chunk_length, self->str_errors, &consumed);
if (!string)
goto error;
@@ -946,9 +946,28 @@ decode_definite_long_string(CBORDecoderObject *self, Py_ssize_t length)
chunk = NULL;
}
+ // Process any remaining buffered bytes (e.g., incomplete multi-byte UTF-8 sequences)
+ if (buffer_length > 0) {
+ string = PyUnicode_DecodeUTF8(buffer, buffer_length, self->str_errors);
+ if (!string)
+ goto error;
+
+ if (ret) {
+ PyObject *joined = PyUnicode_Concat(ret, string);
+ Py_DECREF(string);
+ if (!joined)
+ goto error;
+ ret = joined;
+ } else {
+ ret = string;
+ }
+ }
+
if (ret && string_namespace_add(self, ret, length) == -1)
goto error;
+ if (buffer)
+ PyMem_Free(buffer);
return ret;
error:
Py_XDECREF(ret);
I'll let you choose if you want to fix this issue. This was also not fixed on my PR; I think all the strings I decode are on the smaller side. I found the issue while checking your PR.
There was a problem hiding this comment.
Thanks, I'll take a look tomorrow. Funny that we did this at the same time for something that's been there so long 😄
agronholm
left a comment
There was a problem hiding this comment.
I have a couple questions to start with.
|
I've been looking into rewriting this library in Rust, and allocating strings on the stack was one issue that I had to look into, as the Rust |
|
Seems there's another pre-existing memory leak here too, when there is an utf-8 sequence that spans chunk boundaries. That's not really valid cbor, but I'll try to fix it while I'm in there |
Adding tests that already pass is not a sin in itself, but they would need to at least increase the coverage. We don't have coverage tracking for the C code, and I don't know how to even do that. |
- Pass str_errors to PyUnicode_DecodeUTF8Stateful (fixes agronholm#255) - Handle remaining bytes after loop for incomplete UTF-8 at string end - Fix reference leak: DECREF old ret before reassigning to joined - Fix memory leak: free buffer in success path
|
I seems like I made a poo-poo with the merge from master. I maybe too tired to fix it right now, so I'll look at this with fresh eyes tomorrow. |
|
Thanks! |
Changes
Do temporary allocations for small strings on the stack, and avoid creating an
intermediate pybytes object. This improves performance, around 9-17%, with no
regressions in any of my testing.
Also fixes #255, and removes a nonexistent "error" string handler, since it was in the same code.
Checklist
tests/) which would fail without your patchdocs/), in case of behavior changes or newfeatures
docs/versionhistory.rst).