New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-101765: Fix SystemError / segfault from undefined behavior in iter __reduce__
#101769
base: main
Are you sure you want to change the base?
Conversation
Objects/bytearrayobject.c
Outdated
|
|
||
| /* _PyEval_GetBuiltin can invoke arbitrary code. | ||
| * calls must be *before* access of `it` pointers, | ||
| * since C/C++ parameter eval order is undefined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * since C/C++ parameter eval order is undefined. | |
| * since C parameter eval order is undefined. |
This is C code, so we don't care about C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed C++ references in
45522c6
Objects/listobject.c
Outdated
| @@ -3443,19 +3443,26 @@ static PyObject * | |||
| listiter_reduce_general(void *_it, int forward) | |||
| { | |||
| PyObject *list; | |||
| PyObject *iter; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just declare these within the block where they are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the declarations to the if blocks in 049a8dd
…o potential side effects
- Added some comments for dict del usages - Switched to `__builtin__` instead of conditional `__dict__` access - Use kwargs for improved readability
…ist conditional branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor nits, but otherwise this LGTM.
I assume you did verify that the added test fails/crashes if any of the code changes are reverted? (I guess assuming you are on a system where arg eval order is such that this crash does repro.)
Misc/NEWS.d/next/Core and Builtins/2023-02-10-07-21-47.gh-issue-101765.MO5LlC.rst
Outdated
Show resolved
Hide resolved
I have an example here that uses multiprocessing.Pool to run the tests (due to potential of crash), and all cases either fails due to the wrong value, or SystemError. Not sure if it's worth doing in the tests here since it does raise the complexity a bit due to pickling requirements.
Detailsimport builtins
import functools
import multiprocessing
import unittest
class EmptyIterClass:
def __len__(self):
return 0
def __getitem__(self, i):
raise StopIteration
def run_in_subprocess(func, *args):
with multiprocessing.Pool(1) as pool:
res = pool.apply_async(func, args=args)
return res.get(timeout=2)
def get_zero():
return 0
def run_test_iter_reduce(builtin_name, item, sentinel=None):
it = iter(item) if sentinel is None else iter(item, sentinel)
# Backup builtins
builtins_dict = builtins.__dict__
orig = {"iter": iter, "reversed": reversed}
class CustomStr:
def __init__(self, name, iterator):
self.name = name
self.iterator = iterator
def __hash__(self):
return hash(self.name)
def __eq__(self, other):
# Here we exhaust our iterator, possibly changing
# its `it_seq` pointer to NULL
# The `__reduce__` call should correctly get
# the pointers after this call
list(self.iterator)
return other == self.name
# del is required here
# since only setting will result in
# both keys existing with a hash collision
del builtins_dict[builtin_name]
builtins_dict[CustomStr(builtin_name, it)] = orig[builtin_name]
return it.__reduce__()
class Tests(unittest.TestCase):
def test_reduce_mutating_builtins_iter(self):
# This is a reproducer of issue #101765
# where iter `__reduce__` calls could lead to a segfault or SystemError
# depending on the order of C argument evaluation, which is undefined
types = [
(EmptyIterClass(),),
(bytes(3),),
(bytearray(3),),
((1, 2, 3),),
(get_zero, 0),
(tuple[int],) # GenericAlias
]
run_iter = functools.partial(run_test_iter_reduce, "iter")
# The returned value of `__reduce__` should not only be valid
# but also *empty*, as `it` was exhausted during `__eq__`
# i.e "xyz" returns (iter, ("",))
with self.subTest(case="str"):
self.assertEqual(
run_in_subprocess(run_iter, "xyz"),
(iter, ("",))
)
with self.subTest(case="list"):
self.assertEqual(
run_in_subprocess(run_iter, [1, 2, 3]),
(iter, ([],))
)
# _PyEval_GetBuiltin is also called for `reversed` in a branch of
# listiter_reduce_general
with self.subTest(case="reversed list"):
self.assertEqual(
run_in_subprocess(
run_test_iter_reduce,
"reversed",
reversed([*range(8)])
),
(iter, ([],))
)
for case in types:
with self.subTest(case=case):
self.assertEqual(
run_in_subprocess(run_iter, *case),
(iter, ((),))
)
if __name__ == "__main__":
unittest.main()Essentially, currently there are 3 possibilities for the tests to go before this PR:
So there should be no possibility that any of these tests pass on builds before this PR. |
Summary
__reduce__is called oniterobjects when the key of the hash of"iter"within__builtins__.__dict__is a custom object that executes arbitrary code within__eq__, mutating theiterobject and causing illegal memory access or SystemError (based on C argument evaluation order, which is undefined behavior).Affected methods
iterobject.citer_reducecalliter_reducelistobject.clistiter_reduce_generalbytearrayobject.cbytearrayiter_reducebytesobject.cstriter_reducetupleobject.ctupleiter_reducegenericaliasobject.cga_iter_reduceThis PR also fixes a compounded issue where currently
genericaliasobject.ga_iter_reducedoes not handle empty iterators at all and has no NULL checkAlong with moving the evaluation of
_PyEval_GetBuiltinfor potential side effects, this also adds handling of the NULL case (like the otheriter_reducefunctions have:Linked Issue
__reduce__can segfault if accessing__builtins__.__dict__['iter']mutates the iter object #101765