[C API] Change PyUnicode_AsUTF8() to return NULL on embedded null characters #111089

vstinner · 2023-10-19T23:28:24Z

I propose to change the PyUnicode_AsUTF8() API to raise an exception and return NULL if the string contains embedded null characters.

If the string contains an embedded null character, the UTF-8 encoded string can be truncated if used with C functions using char* since a null byte is treated as the terminator: marker of the string end. Truncating a string silently is a bad practice and can lead to different bugs including security vulnerabilities.

In practice, the minority of impacted C extensions and impacted users should benefit of such backward incompatible change, since truncating a string silently is a bad practice. Impacted users can use PyUnicode_AsUTF8AndSize(obj, NULL) and just ignore the size if they want to truncate on purpose.

It would address the following "hidden" comment on PyUnicode_AsUTF8():

Use of this API is DEPRECATED since no size information can be
extracted from the returned data.

PyUnicode_AsUTF8String() is part of the limited C API, whereas PyUnicode_AsUTF8() is not.

In the recently added PyUnicode_EqualToUTF8(obj, str), str is treated as not equal if obj contains embedded null characters.

The folllowing functions already raise an exception if the string contains embedded null characters or bytes:

PyUnicode_AsWideCharString()
PyUnicode_EncodeLocale()
PyUnicode_EncodeFSDefault()
PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize()
PyUnicode_DecodeFSDefaultAndSize()
PyUnicode_FSConverter()
PyUnicode_FSDecoder()

PyUnicode_AsUTF8String() returns a bytes object and so the length, so it doesn't raise the exception.

PyUnicode_AsUTF8AndSize() also returns the size and so don't raise on embedded null characters.

Linked PRs

The text was updated successfully, but these errors were encountered:

* PyUnicode_AsUTF8() now raises an exception if the string contains embedded null characters. * PyUnicode_AsUTF8AndSize() now sets `*size` to 0 on error to avoid undefined variable value. * Update related C API tests (test_capi.test_unicode).

* PyUnicode_AsUTF8() now raises an exception if the string contains embedded null characters. * PyUnicode_AsUTF8AndSize() now sets `*size` to 0 on error to avoid undefined variable value. * Update related C API tests (test_capi.test_unicode). * type_new_set_doc() uses PyUnicode_AsUTF8AndSize() to silently truncate doc containing null bytes.

On error, PyUnicode_AsUTF8AndSize() now sets the size argument to 0, to avoid undefined value.

* PyUnicode_AsUTF8() now raises an exception if the string contains embedded null characters. * Update related C API tests (test_capi.test_unicode). * type_new_set_doc() uses PyUnicode_AsUTF8AndSize() to silently truncate doc containing null bytes.

On error, PyUnicode_AsUTF8AndSize() now sets the size argument to -1, to avoid undefined value.

* PyUnicode_AsUTF8() now raises an exception if the string contains embedded null characters. * Update related C API tests (test_capi.test_unicode). * type_new_set_doc() uses PyUnicode_AsUTF8AndSize() to silently truncate doc containing null bytes. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

Add PyUnicode_AsUTF8() function to the limited C API. multiprocessing posixshmem now uses PyUnicode_AsUTF8() instead of PyUnicode_AsUTF8AndSize(): the extension is built with the limited C API. The function now raises an exception if the filename contains an embedded null character instead of truncating silently the filename.

PyUnicode_AsUTF8() now raises an exception if the string contains embedded null characters.

Add PyUnicode_AsUTF8() function to the limited C API. multiprocessing posixshmem now uses PyUnicode_AsUTF8() instead of PyUnicode_AsUTF8AndSize(): the extension is built with the limited C API. The function now raises an exception if the filename contains an embedded null character instead of truncating silently the filename.

On error, PyUnicode_AsUTF8AndSize() now sets the size argument to -1, to avoid undefined value.

PyUnicode_AsUTF8() now raises an exception if the string contains embedded null characters.

vstinner · 2023-10-20T18:06:03Z

I added PyUnicode_AsUTF8() to the limited C API and it now raises an exception if the string contains embedded null characters.

encukou · 2023-10-23T09:39:59Z

This makes PyUnicode_AsUTF8 O(N), as it needs to scan the entire string. I don't think that's a change to make lightly.
I also strongly believe that it's not a good fit for the limited API. The better alternative -- PyUnicode_AsUTF8AndSize is already there.

vstinner · 2023-10-27T15:19:53Z

This makes PyUnicode_AsUTF8 O(N), as it needs to scan the entire string.

If it's a performance issue, we can cache the "contains embedded null characters/bytes" information in PyASCIIObject structure.

I also strongly believe that it's not a good fit for the limited API.

Would you mind to elaborate?

The better alternative -- PyUnicode_AsUTF8AndSize is already there.

It's a different API. In C, char* with NUL terminator is used everywhere. Embedded NUL bytes is truncating strings which caused many bugs in the past.

In C, it's rare that C code uses a length with a char* string.

encukou · 2023-10-28T10:24:09Z

This makes PyUnicode_AsUTF8 O(N), as it needs to scan the entire string.
If it's a performance issue, we can cache the "contains embedded null characters/bytes" information in PyASCIIObject structure.

OK! Will you do that?
What are the performance trade-offs of that solution?

I also strongly believe that it's not a good fit for the limited API.

Would you mind to elaborate?

The API that existed with this name since before Python 3.0 is -- as you claim -- unsafe. The new semantics are not proven; usually we wait before the API stabilizes before declaring it stable for the long term.

Replace PyUnicode_AsUTF8AndSize() with PyUnicode_AsUTF8() to remove the explicit check for embedded null characters. The change avoids to have to include explicitly <string.h> to get the strlen() function when using a recent version of the limited C API.

Add PyASCIIObject.state.embed_null member to Python str objects. It is used as a cache by PyUnicode_AsUTF8() to only check once if a string contains a null character. Strings created by PyUnicode_FromString() initializes *embed_null* since the string cannot contain a null character. Global static strings now also initialize the *embed_null* member. The chr(0) singleton ("\0" string) is the only static string which contains a null character.

Fix unicode_subtype_new

gpshead · 2023-11-03T06:36:10Z

While I understand the rationale behind this and wish the API had been designed that way when PyUnicode was AsUTF8 was created... It is too late.

Changing AsUTF8 (or AsUTF8AndSize(str, NULL) as your draft PR does!) to return NULL with an exception set in situations where they did not in the past breaks our existing API contract.

Thus if we were to make this change, it should require a PEP 387 deprecation cycle - With the changed C APIs setting a warning (assuming no performance issues as with the current O(n) are introduced), and only making the behavior change to be an exception a few releases later. Setting warnings from within a C API is a bit weird, is that even going to be useful for people seeing them in tests to debug where the problem came from?

Improve the documentation to advise people to use a length and validate the length before crossing over into C-string land anywhere security might matter is sufficient. No compelling serious reason to change these APIs out from underneath existing working code has been presented. So I'm going with "lets just not do this".

There is still an API improvement we could make to make peoples safety minded lives easier without sacrificing performance: A new API could be added that simply returns a 0 or 1 if a PyUnicode object contains a null character (known at creation time via #111587 like mechanisms) for code that has reason to care on 3.13+ to use rather than calling an expensive strlen() to compare against the returned size as is required today. int PyUnicode_ContainsNul(PyObject*) would effectively be a cached precomputed version of its equivalent more complicated and costly PyUnicode_FindChar() call.

Other reasons not to do this: It breaks symmetry with PyBytes_FromString() which is what the API is trying to mimic for historical reasons. bytes can contain NUL characters. But we're never going to change that API to fail.

To be clear: This is me saying the changes made so far for this issue should be reverted.

…1585) Replace PyUnicode_AsUTF8AndSize() with PyUnicode_AsUTF8() to remove the explicit check for embedded null characters. The change avoids to have to include explicitly <string.h> to get the strlen() function when using a recent version of the limited C API.

Replace PyUnicode_AsUTF8AndSize() with PyUnicode_AsUTF8() to remove the explicit check for embedded null characters.

Revert PyUnicode_AsUTF8() change: it no longer rejects embedded null characters: the PyUnicode_AsUTF8Safe() function should be used instead.

serhiy-storchaka · 2023-11-03T10:15:35Z

Looking at precedence, PyBytes_AsStringAndSize() and PyUnicode_AsWideCharString() check for the embedded NULs when the output size pointer is NULL, but PyBytes_AsString() and PyUnicode_AsWideChar() do not.

So yet one alternative is to add the check in PyUnicode_AsUTF8AndSize() and remove it from PyUnicode_AsUTF8(). It keeps PyUnicode_AsUTF8() as a fast accessor to the internal UTF-8 cache and adds an explicit way to check for embedded NULs. PyUnicode_AsUTF8AndSize(u, NULL) is not what somebody would write intentionally if they can use PyUnicode_AsUTF8(), so it will affect less code.

The check for the embedded NULs was in PyBytes_AsStringAndSize() from the beginning (see d1ba443), and for PyUnicode_AsWideCharString() it was added in bpo-30708/#74893.

I do not insist on this way, I only show yet one alternative.

vstinner · 2023-11-03T10:32:41Z

@gpshead:

Thus if we were to make this change, it should require a PEP 387 deprecation cycle - With the changed C APIs setting a warning (assuming no performance issues as with the current O(n) are introduced), and only making the behavior change to be an exception a few releases later.

I would not be comfortable with emitting a DeprecationWarning. For me, we should change the API without deprecation period: either we make "the default" safe (reject null embedded characters), or we leave the default unchanged and we provide a new "safe" API. Here "the default" are PyUnicode_AsUTF8(unicode) and PyUnicode_AsUTF8AndSize(unicode, NULL) functions. I'm afraid that a deprecation period would make the situation worse for everybody, I'm not convinced that it will make the situation better.

Also @Yhg1s seem to like using PyUnicode_AsUTF8() to truncate strings with embedded null characters on purpose. It doesn't want the "default" to change. I'm not sure that a deprecation would help.

int PyUnicode_ContainsNul(PyObject*) would effectively be a cached precomputed version of its equivalent more complicated and costly PyUnicode_FindChar() call.

I proposed PyUnicode_AsUTF8Safe() instead in my new PR #111688. PyUnicode_ContainsNul() requires to raise an exception explicitly: it requires more lines of code, and so I'm afraid that people will not adopt such API and so write unsafe code because of lazyness. Moreover, I'm not sure about adding a specialized PyUnicode_FindChar() function just for the NUL character: the existing API can be used. If tomorrow an optimization is needed, PyUnicode_FindChar(0) should also use it.

About performance, the correctness of my proposed PR #111587 ("embed_null" cache) is under discussions. Since the C API makes it technically possible to mutate string, and people well, actually use anything which is possible by the C API, it's hard to guarantee correctness and "prefill" the cache when a string is created. Moreover, the performance gain is also being discussed: so far, I didn't manage to convince @methane and @serhiy-storchaka on that part :-)

Other reasons not to do this: It breaks symmetry with PyBytes_FromString() which is what the API is trying to mimic for historical reasons. bytes can contain NUL characters. But we're never going to change that API to fail.

A bytes object is different: it's just an array of bytes (0..255 integers). PyUnicode_AsUTF8() is a complex operation: it encodes Unicode code points (characters) to UTF-8. The UTF-8 encoded string is different and longer if the string contains non-ASCII characters. For bytes, PyBytes_Size() gives you PyBytes_AsString() size, it's an O(1) operation: there is no conversion.

Moreover, PyBytes_AsStringAndSize(bytes, NULL) already raises a ValueError if the string contains null bytes ;-) It's a deliberate feature and many functions use this API to easily reject null bytes. Well, I dislike passing NULL to check for null bytes, I would prefer a separated API for that, but it's a different discussion.

I prefer to use PyUnicode_AsWideChar() example since it also converts a Python str object to a null terminated string: it's just that the format is different, wchar_t* type instead of char* type. PyUnicode_AsWideChar() was modified a long time ago and the change went fine.

Apparently, PyUnicode_AsUTF8() is too fundamental and too specific to be modified in Python 3.13. I understood that it would have make sense to change it in Python 3.0 when Unicode became the default string type, so when PyUnicode_AsUTF8() (previously called PyUnicode_AsString()) was the most natural way to pass a Python string to a C char* function. I understand the "breaking the contract" part. The sails has sailed.

Well, as I wrote, I proposed PR #111688 to revert PyUnicode_AsUTF8() to Python 3.12 behavior and propose a new "safe" API to convert a Python str object to UTF-8: PyUnicode_AsUTF8Safe(). (Let's discuss the proposed name in the PR ;-))

vstinner · 2023-11-03T10:41:45Z

@serhiy-storchaka:

The check for the embedded NULs was in PyBytes_AsStringAndSize() from the beginning (see d1ba443), and for PyUnicode_AsWideCharString() it was added in bpo-30708/#74893.

From what I understood, @gpshead and @Yhg1s concerns are more about for how long people rely on the exact C API. Since PyUnicode_AsUTF8() is being used since Python 3.0 released in 2008: people expect the same behavior for 15 years, and so basically it's no longer possible to change the behavior.

I think that any problem which was not explained previously is that a "string" itself has no specific purpose, it's very generic. The Perl programming language has a "tained string" feature to make a string as "security sensitive". But in Python, a string can be "anything": a person name, an URL, a filename, XML, well, whatever.

The problem is to decide in the broad general case if a string is "security sensitive" or not. I proposed to make the assumption that all strings are security sensitive. @Yhg1s argues that no, in his use case, it's not the case and having to "pay a performance tax" of additional checks is not acceptable.

Python was already fixed to reject null characters when it's obvious that the string is security sensitive: for filenames. Basically all filename operations reject null characters.

Apparently, PyUnicode_AsUTF8() is "too broad" or "too generic" to be treated as "security sensitive".

I'm not sure why PyUnicode_AsWideCharString() could be treated differently. On Windows, using Windows native API requires to encode Python strings to 16-bit wchar_t* (UTF-16). Maybe rejecting null characters is fine in this case.

The fact that PyBytes_AsStringAndSize(bytes, NULL) reject null bytes means that there are functions which treat bytes strings as security sensitive and they need an API to reject null bytes. Well, as I wrote in my previous comment, I would prefer a separated API (such as PyBytes_AsStringSafe()) for that, but maybe now it's no longer needed since PyBytes_AsStringAndSize(bytes, NULL) already exists and is being used.

Yhg1s · 2023-11-03T10:52:42Z

From what I understood, @gpshead and @Yhg1s concerns are more about for how long people rely on the exact C API.

No, how long the function has existed doesn't matter. Just that it exists. It's been in a released Python. People use it. And that's only one of my concerns.

serhiy-storchaka · 2023-11-03T11:05:13Z

int PyUnicode_ContainsNul(PyObject*)

See _PyUnicode_HasNULChars(), added in 1334884 (bpo-13848/#58056), removed in 7e9d1d1 (bpo-23908/#68096). It was more convenient to embed the check in functions that return null-terminated strings.

zooba · 2023-11-03T11:27:32Z

I would not be comfortable with emitting a DeprecationWarning. For me, we should change the API without deprecation period

In this case, why is the issue tagged "feature" instead of being sent to the security response team and handled under embargo?

It's clearly not a real security issue, and so there is no excuse for breaking the function without a normal deprecation period. If you're not comfortable with deprecation, then don't break the function.

serhiy-storchaka · 2023-11-03T11:32:07Z

In a similar issue with the Py_UNICODE cache, I added new private function _PyUnicode_AsUnicode() which was similar to PyUnicode_AsUnicode(), but checked for embedded NULs. It is because PyUnicode_AsUnicode() returned a pointer to a modifiable buffer for legacy strings which could contain uninitialized data. It was similar to PyBytes_AsString().

Revert PyUnicode_AsUTF8() change: it no longer rejects embedded null characters: the PyUnicode_AsUTF8Safe() function should be used instead.

…ython#111585)" This reverts commit d9b606b.

…111620)" This reverts commit cde1071.

python#111091)" This reverts commit d731579.

…ython#111121)" This reverts commit d8f32be.

…1122)" This reverts commit 37e4e20.

* Revert "gh-111089: Use PyUnicode_AsUTF8() in Argument Clinic (#111585)" This reverts commit d9b606b. * Revert "gh-111089: Use PyUnicode_AsUTF8() in getargs.c (#111620)" This reverts commit cde1071. * Revert "gh-111089: PyUnicode_AsUTF8() now raises on embedded NUL (#111091)" This reverts commit d731579. * Revert "gh-111089: Add PyUnicode_AsUTF8() to the limited C API (#111121)" This reverts commit d8f32be. * Revert "gh-111089: Use PyUnicode_AsUTF8() in sqlite3 (#111122)" This reverts commit 37e4e20.

vstinner · 2023-11-07T22:36:38Z

There seems to be multiple disagreements on multiple things around this issue:

Disagreement if truncating a string on embedded null character is a bug, the expected behavior, or it's just ok we don't care.
Disagreement if truncating a string can lead to security issue or not -- see issue Embedded null characters can lead to bugs or even security vulnerabilities #111656 discussion.
Disagreement if PyUnicode_AsUTF8() should reject or not embedded null characters.
Disagreement if a new function rejecting embedded null characters is needed.
Disagreement if anything should be done, since C developers must be aware of security, must be aware of issues with embedded null characters, truncated is not the only security concern so why caring about this one anyway, etc.
Disagreement on who take decisions: @encukou is now asking the C API Working Group to take the decision, but the working group doesn't exist yet.

When I made these changes, they were approved by my peers and I thought that we were all on the same page on embedded null characters, since Python is already rejecting them in many places such as PyUnicode_AsWideCharString() for many years. But well, then read the discussions for details about disagreements.

I reverted all changes (except of PR #111106 which is unrelated) in PR #111833.

I close the issue for now.

gpshead · 2023-11-08T01:20:37Z

For reference I did attempt a test run of our huge "everything" style codebase at work over the weekend with AsUTF8 and AsUTF8AndSize(p, NULL) both set to return an error upon an embedded '\0' character. That test attempt didn't get far (auto-cancelled before it could run more due to the # of failures). Too many things were failing to even build because several Python tools used at build time triggered the error. https://github.com/eliben/pyelftools/ and https://github.com/fonttools/fonttools/ at least, I didn't try to construct a list and am not going to test further.

That various things depend on the current (and now restored, thank you!) behavior, regardless of what any of us think of that, was what I wanted it to show. It did. :)

vstinner · 2023-11-08T08:52:50Z

because several Python tools used at build time triggered the error. https://github.com/eliben/pyelftools/ and https://github.com/fonttools/fonttools/ at least

Any idea what is their use case for embedded null characters? How are such strings build? By decoding bytes from UTF-8?

vstinner · 2023-11-08T08:54:07Z

Thanks for that feedback/test, it's very useful 👍

* Revert "pythongh-111089: Use PyUnicode_AsUTF8() in Argument Clinic (python#111585)" This reverts commit d9b606b. * Revert "pythongh-111089: Use PyUnicode_AsUTF8() in getargs.c (python#111620)" This reverts commit cde1071. * Revert "pythongh-111089: PyUnicode_AsUTF8() now raises on embedded NUL (python#111091)" This reverts commit d731579. * Revert "pythongh-111089: Add PyUnicode_AsUTF8() to the limited C API (python#111121)" This reverts commit d8f32be. * Revert "pythongh-111089: Use PyUnicode_AsUTF8() in sqlite3 (python#111122)" This reverts commit 37e4e20.

encukou · 2023-11-08T12:50:28Z

Thank you for the revert. Sorry that the issues were only discovered after merge, and after such heated discussion. Maybe there's some process improvement to be made.

I filed this issue for the C-API-WG-to-be: capi-workgroup/api-evolution#39
Feel free to re-frame the issue or propose other solutions.

vstinner added the topic-C-API label Oct 19, 2023

vstinner self-assigned this Oct 19, 2023

bedevere-app bot mentioned this issue Oct 20, 2023

gh-111089: PyUnicode_AsUTF8() now raises on embedded NUL #111091

Merged

vstinner added a commit to vstinner/cpython that referenced this issue Oct 20, 2023

pythongh-111089: PyUnicode_AsUTF8AndSize() sets size on error

585158d

On error, PyUnicode_AsUTF8AndSize() now sets the size argument to 0, to avoid undefined value.

bedevere-app bot mentioned this issue Oct 20, 2023

gh-111089: PyUnicode_AsUTF8AndSize() sets size on error #111106

Merged

vstinner added a commit to vstinner/cpython that referenced this issue Oct 20, 2023

pythongh-111089: PyUnicode_AsUTF8AndSize() sets size on error

ae7734d

On error, PyUnicode_AsUTF8AndSize() now sets the size argument to -1, to avoid undefined value.

bedevere-app bot mentioned this issue Oct 20, 2023

gh-111089: Add PyUnicode_AsUTF8() to the limited C API #111121

Merged

vstinner added a commit to vstinner/cpython that referenced this issue Oct 20, 2023

pythongh-111089: Use PyUnicode_AsUTF8() in sqlite3

714c652

PyUnicode_AsUTF8() now raises an exception if the string contains embedded null characters.

bedevere-app bot mentioned this issue Oct 20, 2023

gh-111089: Use PyUnicode_AsUTF8() in sqlite3 #111122

Merged

vstinner added a commit that referenced this issue Oct 20, 2023

gh-111089: PyUnicode_AsUTF8AndSize() sets size on error (#111106)

f1e751e

On error, PyUnicode_AsUTF8AndSize() now sets the size argument to -1, to avoid undefined value.

vstinner added a commit that referenced this issue Oct 20, 2023

gh-111089: Use PyUnicode_AsUTF8() in sqlite3 (#111122)

37e4e20

PyUnicode_AsUTF8() now raises an exception if the string contains embedded null characters.

vstinner closed this as completed Oct 20, 2023

encukou reopened this Oct 23, 2023

bedevere-app bot mentioned this issue Nov 1, 2023

gh-111089: Use PyUnicode_AsUTF8() in Argument Clinic #111585

Merged

bedevere-app bot mentioned this issue Nov 1, 2023

gh-111089: Add cache to PyUnicode_AsUTF8() for embedded NUL #111587

Closed

vstinner added a commit to vstinner/cpython that referenced this issue Nov 1, 2023

fixup! pythongh-111089: Add cache to PyUnicode_AsUTF8() for embedded NUL

ea99865

Fix unicode_subtype_new

FullteaR pushed a commit to FullteaR/cpython that referenced this issue Nov 3, 2023

pythongh-111089: Use PyUnicode_AsUTF8() in getargs.c (python#111620)

4db128c

Replace PyUnicode_AsUTF8AndSize() with PyUnicode_AsUTF8() to remove the explicit check for embedded null characters.

vstinner added a commit to vstinner/cpython that referenced this issue Nov 3, 2023

pythongh-111089: Add PyUnicode_AsUTF8Safe() function

191bb3a

Revert PyUnicode_AsUTF8() change: it no longer rejects embedded null characters: the PyUnicode_AsUTF8Safe() function should be used instead.

bedevere-app bot mentioned this issue Nov 3, 2023

gh-111089: Add PyUnicode_AsUTF8NoNUL() function #111688

Closed

vstinner mentioned this issue Nov 4, 2023

gh-111389: expose _PyHASH_INF/BITS/MODULUS/IMAG macros as public #111418

Open

vstinner added a commit to vstinner/cpython that referenced this issue Nov 4, 2023

pythongh-111089: Add PyUnicode_AsUTF8Safe() function

96b2b8b

Revert PyUnicode_AsUTF8() change: it no longer rejects embedded null characters: the PyUnicode_AsUTF8Safe() function should be used instead.

vstinner added a commit to vstinner/cpython that referenced this issue Nov 7, 2023

pythongh-111089: Add PyUnicode_AsUTF8Safe() function

36973d6

Revert PyUnicode_AsUTF8() change: it no longer rejects embedded null characters: the PyUnicode_AsUTF8Safe() function should be used instead.

vstinner added a commit to vstinner/cpython that referenced this issue Nov 7, 2023

Revert "pythongh-111089: Use PyUnicode_AsUTF8() in Argument Clinic (p…

cba249d

…ython#111585)" This reverts commit d9b606b.

vstinner added a commit to vstinner/cpython that referenced this issue Nov 7, 2023

Revert "pythongh-111089: Use PyUnicode_AsUTF8() in getargs.c (python#…

938652f

…111620)" This reverts commit cde1071.

vstinner added a commit to vstinner/cpython that referenced this issue Nov 7, 2023

Revert "pythongh-111089: PyUnicode_AsUTF8() now raises on embedded NUL (

be7e341

python#111091)" This reverts commit d731579.

vstinner added a commit to vstinner/cpython that referenced this issue Nov 7, 2023

Revert "pythongh-111089: Add PyUnicode_AsUTF8() to the limited C API (p…

0e87483

…ython#111121)" This reverts commit d8f32be.

vstinner mentioned this issue Nov 7, 2023

gh-111089: Revert PyUnicode_AsUTF8() changes #111833

Merged

vstinner added a commit to vstinner/cpython that referenced this issue Nov 7, 2023

Revert "pythongh-111089: Use PyUnicode_AsUTF8() in sqlite3 (python#11…

8b4a011

…1122)" This reverts commit 37e4e20.

vstinner closed this as completed Nov 7, 2023

encukou mentioned this issue Nov 8, 2023

Soft-deprecate PyUnicode_AsUTF8 capi-workgroup/api-evolution#39

Open

[C API] Change PyUnicode_AsUTF8() to return NULL on embedded null characters #111089

[C API] Change PyUnicode_AsUTF8() to return NULL on embedded null characters #111089

vstinner commented Oct 19, 2023 •

edited by bedevere-app bot

vstinner commented Oct 20, 2023

encukou commented Oct 23, 2023

vstinner commented Oct 27, 2023

encukou commented Oct 28, 2023

gpshead commented Nov 3, 2023 •

edited

serhiy-storchaka commented Nov 3, 2023

vstinner commented Nov 3, 2023

vstinner commented Nov 3, 2023 •

edited

Yhg1s commented Nov 3, 2023

serhiy-storchaka commented Nov 3, 2023

zooba commented Nov 3, 2023

serhiy-storchaka commented Nov 3, 2023

vstinner commented Nov 7, 2023

gpshead commented Nov 8, 2023

vstinner commented Nov 8, 2023

vstinner commented Nov 8, 2023

encukou commented Nov 8, 2023 •

edited

[C API] Change PyUnicode_AsUTF8() to return NULL on embedded null characters #111089

[C API] Change PyUnicode_AsUTF8() to return NULL on embedded null characters #111089

Comments

vstinner commented Oct 19, 2023 • edited by bedevere-app bot

Linked PRs

vstinner commented Oct 20, 2023

encukou commented Oct 23, 2023

vstinner commented Oct 27, 2023

encukou commented Oct 28, 2023

gpshead commented Nov 3, 2023 • edited

serhiy-storchaka commented Nov 3, 2023

vstinner commented Nov 3, 2023

vstinner commented Nov 3, 2023 • edited

Yhg1s commented Nov 3, 2023

serhiy-storchaka commented Nov 3, 2023

zooba commented Nov 3, 2023

serhiy-storchaka commented Nov 3, 2023

vstinner commented Nov 7, 2023

gpshead commented Nov 8, 2023

vstinner commented Nov 8, 2023

vstinner commented Nov 8, 2023

encukou commented Nov 8, 2023 • edited

vstinner commented Oct 19, 2023 •

edited by bedevere-app bot

gpshead commented Nov 3, 2023 •

edited

vstinner commented Nov 3, 2023 •

edited

encukou commented Nov 8, 2023 •

edited