Skip to content

Stack allocate small strings#270

Merged
agronholm merged 14 commits intoagronholm:masterfrom
andreer:stack-allocate-small-strings-v4
Mar 22, 2026
Merged

Stack allocate small strings#270
agronholm merged 14 commits intoagronholm:masterfrom
andreer:stack-allocate-small-strings-v4

Conversation

@andreer
Copy link
Copy Markdown
Contributor

@andreer andreer commented Jan 2, 2026

Changes

Do temporary allocations for small strings on the stack, and avoid creating an
intermediate pybytes object. This improves performance, around 9-17%, with no
regressions in any of my testing.

Also fixes #255, and removes a nonexistent "error" string handler, since it was in the same code.

Checklist

  • You've added tests (in tests/) which would fail without your patch
  • You've updated the documentation (in docs/), in case of behavior changes or new
    features
  • You've added a new changelog entry (in docs/versionhistory.rst).

'error' was never a valid Python string error handler. Accept it for
backwards compatibility but normalize to 'strict' internally. Update
error messages to only mention valid options.
The C implementation of decode_definite_short_string was not respecting
the str_errors setting - it always used PyUnicode_FromStringAndSize which
only supports strict mode. Now uses PyUnicode_DecodeUTF8 with the
str_errors field passed directly.

Store str_errors as const char* (NULL for strict, "replace" for replace)
instead of PyObject*. This eliminates a conditional in the hot path since
PyUnicode_DecodeUTF8 accepts NULL to mean strict mode.

Fixes agronholm#255
Skip creating an intermediate Python bytes object by using fp_read
directly into a PyMem_Malloc buffer. This avoids Python allocator and
reference counting overhead.
Use a stack buffer for strings <= 256 bytes to avoid heap allocation
overhead. Larger strings continue to use PyMem_Malloc.
@andreer andreer changed the title Stack allocate small strings v4 Stack allocate small strings Jan 2, 2026
@coveralls
Copy link
Copy Markdown

coveralls commented Jan 2, 2026

Coverage Status

coverage: 94.565% (+0.02%) from 94.55%
when pulling ad6cc16 on andreer:stack-allocate-small-strings-v4
into a7ac10d on agronholm:master.

Copy link
Copy Markdown

@topher200 topher200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM! It definitely solves #255 for smaller strings, which very important to me 😍

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a test which fails, even on this PR:

  def test_str_errors_long_string_over_65536_bytes(impl):
      """Issue #255: str_errors not respected for strings >65536 bytes."""
      # 65537 bytes: 65536 'a' + 1 invalid UTF-8 byte
      payload = unhexlify("7a00010001" + "61" * 65536 + "c3")
      result = impl.loads(payload, str_errors="replace")
      assert len(result) == 65537
      assert result[-1] == "\ufffd"

Claude came up with this fix. I'm not going to stand by it, but it does make the test pass:

diff --git a/source/decoder.c b/source/decoder.c
index 3751e7a..8518a54 100644
--- a/source/decoder.c
+++ b/source/decoder.c
@@ -905,7 +905,7 @@ decode_definite_long_string(CBORDecoderObject *self, Py_ssize_t length)
         }

         consumed = chunk_length;  // workaround for https://github.com/python/cpython/issues/99612
-        string = PyUnicode_DecodeUTF8Stateful(source_buffer, chunk_length, NULL, &consumed);
+        string = PyUnicode_DecodeUTF8Stateful(source_buffer, chunk_length, self->str_errors, &consumed);
         if (!string)
             goto error;

@@ -946,9 +946,28 @@ decode_definite_long_string(CBORDecoderObject *self, Py_ssize_t length)
         chunk = NULL;
     }

+    // Process any remaining buffered bytes (e.g., incomplete multi-byte UTF-8 sequences)
+    if (buffer_length > 0) {
+        string = PyUnicode_DecodeUTF8(buffer, buffer_length, self->str_errors);
+        if (!string)
+            goto error;
+
+        if (ret) {
+            PyObject *joined = PyUnicode_Concat(ret, string);
+            Py_DECREF(string);
+            if (!joined)
+                goto error;
+            ret = joined;
+        } else {
+            ret = string;
+        }
+    }
+
     if (ret && string_namespace_add(self, ret, length) == -1)
         goto error;

+    if (buffer)
+        PyMem_Free(buffer);
     return ret;
 error:
     Py_XDECREF(ret);

I'll let you choose if you want to fix this issue. This was also not fixed on my PR; I think all the strings I decode are on the smaller side. I found the issue while checking your PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll take a look tomorrow. Funny that we did this at the same time for something that's been there so long 😄

Copy link
Copy Markdown
Owner

@agronholm agronholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple questions to start with.

@agronholm
Copy link
Copy Markdown
Owner

I've been looking into rewriting this library in Rust, and allocating strings on the stack was one issue that I had to look into, as the Rust String::new function always allocates a string on the heap. While it's not clear to me yet if I really need a Rust String in this case, I found the heapless crate that should let me stack allocate plenty of useful types - once I get that far along.

@andreer
Copy link
Copy Markdown
Contributor Author

andreer commented Jan 4, 2026

Seems there's another pre-existing memory leak here too, when there is an utf-8 sequence that spans chunk boundaries. That's not really valid cbor, but I'll try to fix it while I'm in there

  import tracemalloc
  import cbor2

  # 65535 'a' + "€" (E2 82 AC) = 65538 bytes
  # Chunk 1: 65535 'a' + E2 (incomplete, buffer allocated)
  # Chunk 2: 82 AC (completes €, buffer_length=0 but buffer never freed)

  payload = b"\x7a\x00\x01\x00\x02" + b"a" * 65535 + "€".encode()

  tracemalloc.start()
  for i in range(10000):
      cbor2.loads(payload)
  current, peak = tracemalloc.get_traced_memory()
  tracemalloc.stop()

  print(f"Leaked: {current / 1024 / 1024:.0f} MB")  # ~625 MB

@agronholm
Copy link
Copy Markdown
Owner

Seems there's another pre-existing memory leak here too, when there is an utf-8 sequence that spans chunk boundaries. That's not really valid cbor, but I'll try to fix it while I'm in there

  import tracemalloc
  import cbor2

  # 65535 'a' + "€" (E2 82 AC) = 65538 bytes
  # Chunk 1: 65535 'a' + E2 (incomplete, buffer allocated)
  # Chunk 2: 82 AC (completes €, buffer_length=0 but buffer never freed)

  payload = b"\x7a\x00\x01\x00\x02" + b"a" * 65535 + "€".encode()

  tracemalloc.start()
  for i in range(10000):
      cbor2.loads(payload)
  current, peak = tracemalloc.get_traced_memory()
  tracemalloc.stop()

  print(f"Leaked: {current / 1024 / 1024:.0f} MB")  # ~625 MB

Adding tests that already pass is not a sin in itself, but they would need to at least increase the coverage. We don't have coverage tracking for the C code, and I don't know how to even do that.

- Pass str_errors to PyUnicode_DecodeUTF8Stateful (fixes agronholm#255)
- Handle remaining bytes after loop for incomplete UTF-8 at string end
- Fix reference leak: DECREF old ret before reassigning to joined
- Fix memory leak: free buffer in success path
@agronholm
Copy link
Copy Markdown
Owner

I seems like I made a poo-poo with the merge from master. I maybe too tired to fix it right now, so I'll look at this with fresh eyes tomorrow.

@agronholm agronholm merged commit 2b53b28 into agronholm:master Mar 22, 2026
14 checks passed
@agronholm
Copy link
Copy Markdown
Owner

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

loads() str_errors="replace" kwarg no longer worker since 5.6.0

4 participants