Skip to content

Segmentation fault in list and dict dumping when an element exactly consumes the buffer #503

@JustAnotherArchivist

Description

@JustAnotherArchivist

What did you do?

python3 -c 'import ujson; ujson.dumps(["aaaa", "\x00" * 10921])'

What did you expect to happen?

No crash

What actually happened?

SIGSEGV

What versions are you using?

  • OS: Debian Sid
  • Python: 3.10.1
  • UltraJSON: 316d384

Background

The input here is constructed to exactly hit the buffer boundary. To better see what's going on, I added some fprintf statements in the Buffer_Reserve macro:

diff --git a/lib/ultrajsonenc.c b/lib/ultrajsonenc.c
index a9f3ef1..874d332 100644
--- a/lib/ultrajsonenc.c
+++ b/lib/ultrajsonenc.c
@@ -488,8 +488,10 @@ static int Buffer_EscapeStringValidated (JSOBJ obj, JSONObjectEncoder *enc, cons
 }
 
 #define Buffer_Reserve(__enc, __len) \
+    fprintf(stderr, "reserve %zu, remaining %zu\n", (size_t) (__len), (size_t) ((__enc)->end - (__enc)->offset)); \
     if ( (size_t) ((__enc)->end - (__enc)->offset) < (size_t) (__len))  \
     {   \
+      fprintf(stderr, "realloc\n"); \
       Buffer_Realloc((__enc), (__len));\
     }   \
 

With the command above, the output is this, with comments of what they correspond to:

  • reserve 258, remaining 65536 – initial call for encoding the list; evidently, the initial buffer size is 64 KiB (coming from objToJSON)
  • reserve 258, remaining 65535 – pre-name call on first list element; as there is no name, this call is useless
  • reserve 26, remaining 65535 – "aaaa" reservation
  • reserve 258, remaining 65528 – pre-name call on second list element
  • reserve 65528, remaining 65528 – "\x00" * 10921 reservation; note that this exactly consumes the rest of the buffer. Following this, the ] gets written beyond the end of the buffer.
  • reserve 1, remaining 18446744073709551615 – terminating NUL reservation, overflow in the remaining buffer size calculation
  • Segmentation fault

As you might expect, everything is fine when the first string in the input list is one character shorter or longer because the reallocation condition is then triggered instead of overrunning the buffer. Of course, the list doesn't have to terminate after that long NUL string, and anything following that will overrun the buffer as well, e.g. python3 -c 'import ujson; ujson.dumps(["aaaa", "\x00" * 10921, 42])'.

The exact same thing is also possible with a dict, of course. For example, python3 -c 'import ujson; ujson.dumps({"a": None, "b": "\x00" * 10920})'. Just like before, the long NUL string exactly consumes the remaining buffer, and then the trailing } and NUL writes overrun it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions