Skip to content

Commit 4d0cb19

Browse files
committed
Stop assuming that memory addresses are signed
Before this change, the code for the DeduperReloaderPatchingMixin.infer_field_offset() and DeduperReloaderPatchingMixin.try_write_readonly_attr() methods talked about “words”. Unfortunately, the code was misusing the technical term “word”. The code had defined a word to be either 4 or 8 bytes depending on the value of sys.maxsize. On x86-64 processors, this definition is incorrect. On x86-64 processors, a word is always 2 bytes (see page xxvii of revision 3.24 of AMD64 Architecture Programmer’s Manual Volume 1: Application Programming [1] as well as section 4.1 of version 089 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture [2]). For that reason, the term “word” is not a good description for the kind of data that those two methods are dealing with. Those two methods try to find and alter memory addresses. A piece of data that contains a memory address is called a pointer. This change (hopefully) makes the code for those two methods slightly easier to understand by replacing the term “word” with the term “pointer”. More importantly, this change also fixes a bug with the DeduperReloaderPatchingMixin.infer_field_offset() method. Before this change, that method would use ctypes.c_void_p in order to create an int that contains a memory address. It would then search for “words” that contain that memory address. The data type that it used for “words” would either be ctypes.c_int32 or ctypes.c_int64. Ultimately, a memory address is a sequence of bytes. Those bytes can be interpreted as either a signed integer or an unsigned integer (it doesn’t really matter which one you choose as long as you are consistent). The documentation for ctypes.c_void_p says “Represents the C void* type. The value is represented as integer.” [3] The documentation does not specify whether it interprets memory addresses as being signed or unsigned. In this situation, it’s safest to assume that whether or not memory addresses are signed or unsigned is an implementation detail that should not be relied on. The documentation for ctypes.c_int32 and ctypes.c_int64 is different, though. The documentation for ctypes.c_int32 and ctypes.c_int64 clearly specifies that those two data types always interpret sequences of bytes as signed integers [4]. This can cause problems on systems where ctypes.c_void_p interprets memory addresses as unsigned integers. For example, consider this script: import ctypes assert ctypes.sizeof(ctypes.c_void_p) == 8 raw_data = 8 * b'\xFF' print(ctypes.c_void_p.from_buffer_copy(raw_data).value) print(ctypes.c_int64.from_buffer_copy(raw_data).value) On my system, that script prints the following: 18446744073709551615 -1 This means that if I compare the value of a ctypes.c_void_p object to the value of a ctypes.c_int64 object on my system, then I might get misleading results. This change avoids that potential issue by making it so that the DeduperReloaderPatchingMixin.infer_field_offset() method compares ctypes.c_void_p objects to other ctypes.c_void_p objects instead of comparing ctypes.c_void_p objects to ctypes.c_int64 or ctypes.c_int32 objects. Fixes #15107. [1]: <https://docs.amd.com/v/u/en-US/24592_3.24> [2]: <https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html> [3]: <https://docs.python.org/3.14/library/ctypes.html#ctypes.c_void_p> [4]: <https://docs.python.org/3.14/library/ctypes.html#ctypes.c_int32>
1 parent 4d71bcf commit 4d0cb19

1 file changed

Lines changed: 3 additions & 8 deletions

File tree

IPython/extensions/deduperreload/deduperreload_patching.py

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,7 @@
77
NULL: object = object()
88
_MAX_FIELD_SEARCH_OFFSET = 50
99

10-
if sys.maxsize > 2**32:
11-
WORD_TYPE: type[ctypes.c_int32] | type[ctypes.c_int64] = ctypes.c_int64
12-
WORD_N_BYTES = 8
13-
else:
14-
WORD_TYPE = ctypes.c_int32
15-
WORD_N_BYTES = 4
10+
POINTER_N_BYTES = ctypes.sizeof(ctypes.c_void_p)
1611

1712

1813
class DeduperReloaderPatchingMixin:
@@ -32,7 +27,7 @@ def infer_field_offset(
3227
for offset in range(1, _MAX_FIELD_SEARCH_OFFSET):
3328
if (
3429
ctypes.cast(
35-
obj_addr + WORD_N_BYTES * offset, ctypes.POINTER(WORD_TYPE)
30+
obj_addr + POINTER_N_BYTES * offset, ctypes.POINTER(ctypes.c_void_p)
3631
).contents.value
3732
== field_addr
3833
):
@@ -69,7 +64,7 @@ def try_write_readonly_attr(
6964
if new_value not in (None, NULL):
7065
ctypes.pythonapi.Py_IncRef(ctypes.py_object(new_value))
7166
ctypes.cast(
72-
obj_addr + WORD_N_BYTES * offset, ctypes.POINTER(WORD_TYPE)
67+
obj_addr + POINTER_N_BYTES * offset, ctypes.POINTER(ctypes.c_void_p)
7368
).contents.value = new_value_addr
7469

7570
@classmethod

0 commit comments

Comments
 (0)