Stop assuming that memory addresses are signed#15111
Merged
Carreau merged 1 commit intoipython:mainfrom Jan 3, 2026
Jayman2000:fix-15107
Merged
Stop assuming that memory addresses are signed#15111Carreau merged 1 commit intoipython:mainfrom Jayman2000:fix-15107
Carreau merged 1 commit intoipython:mainfrom
Jayman2000:fix-15107
Conversation
Before this change, the code for the
DeduperReloaderPatchingMixin.infer_field_offset() and
DeduperReloaderPatchingMixin.try_write_readonly_attr() methods talked
about “words”. Unfortunately, the code was misusing the technical term
“word”. The code had defined a word to be either 4 or 8 bytes depending
on the value of sys.maxsize. On x86-64 processors, this definition is
incorrect. On x86-64 processors, a word is always 2 bytes (see page
xxvii of revision 3.24 of AMD64 Architecture Programmer’s Manual Volume
1: Application Programming [1] as well as section 4.1 of version 089 of
the Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume
1: Basic Architecture [2]).
For that reason, the term “word” is not a good description for the kind
of data that those two methods are dealing with. Those two methods try
to find and alter memory addresses. A piece of data that contains a
memory address is called a pointer. This change (hopefully) makes the
code for those two methods slightly easier to understand by replacing
the term “word” with the term “pointer”.
More importantly, this change also fixes a bug with the
DeduperReloaderPatchingMixin.infer_field_offset() method. Before this
change, that method would use ctypes.c_void_p in order to create an int
that contains a memory address. It would then search for “words” that
contain that memory address. The data type that it used for “words”
would either be ctypes.c_int32 or ctypes.c_int64.
Ultimately, a memory address is a sequence of bytes. Those bytes can be
interpreted as either a signed integer or an unsigned integer (it
doesn’t really matter which one you choose as long as you are
consistent). The documentation for ctypes.c_void_p says “Represents the
C void* type. The value is represented as integer.” [3] The
documentation does not specify whether it interprets memory addresses as
being signed or unsigned. In this situation, it’s safest to assume that
whether or not memory addresses are signed or unsigned is an
implementation detail that should not be relied on.
The documentation for ctypes.c_int32 and ctypes.c_int64 is different,
though. The documentation for ctypes.c_int32 and ctypes.c_int64 clearly
specifies that those two data types always interpret sequences of bytes
as signed integers [4]. This can cause problems on systems where
ctypes.c_void_p interprets memory addresses as unsigned integers. For
example, consider this script:
import ctypes
assert ctypes.sizeof(ctypes.c_void_p) == 8
raw_data = 8 * b'\xFF'
print(ctypes.c_void_p.from_buffer_copy(raw_data).value)
print(ctypes.c_int64.from_buffer_copy(raw_data).value)
On my system, that script prints the following:
18446744073709551615
-1
This means that if I compare the value of a ctypes.c_void_p object to
the value of a ctypes.c_int64 object on my system, then I might get
misleading results.
This change avoids that potential issue by making it so that the
DeduperReloaderPatchingMixin.infer_field_offset() method compares
ctypes.c_void_p objects to other ctypes.c_void_p objects instead of
comparing ctypes.c_void_p objects to ctypes.c_int64 or ctypes.c_int32
objects.
Fixes #15107.
[1]: <https://docs.amd.com/v/u/en-US/24592_3.24>
[2]: <https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html>
[3]: <https://docs.python.org/3.14/library/ctypes.html#ctypes.c_void_p>
[4]: <https://docs.python.org/3.14/library/ctypes.html#ctypes.c_int32>
Member
|
I'm going to ping the author of the extension @jaysarva I belive; Otherwise this seem ok to me. |
Contributor
|
Looks good to me, thanks for the catch. also cc @smacke |
Contributor
|
Thanks for catching this! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See the commit message for details.
Fixes #15107.