Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-101765: Fix SystemError / segfault from undefined behavior in iter __reduce__ #101769

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

ionite34
Copy link
Contributor

@ionite34 ionite34 commented Feb 10, 2023

Summary

  • Fixes potential segmentation fault or SystemError when __reduce__ is called on iter objects when the key of the hash of "iter" within __builtins__.__dict__ is a custom object that executes arbitrary code within __eq__, mutating the iter object and causing illegal memory access or SystemError (based on C argument evaluation order, which is undefined behavior).

Affected methods

  • iterobject.c
    • iter_reduce
    • calliter_reduce
  • listobject.c
    • listiter_reduce_general
  • bytearrayobject.c
    • bytearrayiter_reduce
  • bytesobject.c
    • striter_reduce
  • tupleobject.c
    • tupleiter_reduce
  • genericaliasobject.c
    • ga_iter_reduce

This PR also fixes a compounded issue where currently genericaliasobject.ga_iter_reduce does not handle empty iterators at all and has no NULL check

Python 3.11.0 (main, Oct 26 2022, 10:14:06) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> x = tuple[int]
>>>
>>> it = iter(x)
>>> _ = list(it)
>>>
>>> it.__reduce__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: NULL object passed to Py_BuildValue

Along with moving the evaluation of _PyEval_GetBuiltin for potential side effects, this also adds handling of the NULL case (like the other iter_reduce functions have:

Python 3.12.0a5+ (heads/fix-reduce-dirty:93854e172e, Feb 10 2023, 13:41:45) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> x = tuple[int]
>>>
>>> it = iter(x)
>>> _ = list(it)
>>>
>>> it.__reduce__()
(<built-in function iter>, ((),))

Linked Issue

@arhadthedev arhadthedev added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Feb 10, 2023

/* _PyEval_GetBuiltin can invoke arbitrary code.
* calls must be *before* access of `it` pointers,
* since C/C++ parameter eval order is undefined.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* since C/C++ parameter eval order is undefined.
* since C parameter eval order is undefined.

This is C code, so we don't care about C++.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed C++ references in
45522c6

@@ -3443,19 +3443,26 @@ static PyObject *
listiter_reduce_general(void *_it, int forward)
{
PyObject *list;
PyObject *iter;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just declare these within the block where they are used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the declarations to the if blocks in 049a8dd

@ionite34 ionite34 marked this pull request as ready for review February 10, 2023 07:29
Lib/test/test_iter.py Outdated Show resolved Hide resolved
Lib/test/test_iter.py Outdated Show resolved Hide resolved
Objects/listobject.c Show resolved Hide resolved
Objects/unicodeobject.c Show resolved Hide resolved
- Added some comments for dict del usages
- Switched to `__builtin__` instead of conditional `__dict__` access
- Use kwargs for improved readability
carljm
carljm approved these changes Feb 10, 2023
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor nits, but otherwise this LGTM.

I assume you did verify that the added test fails/crashes if any of the code changes are reverted? (I guess assuming you are on a system where arg eval order is such that this crash does repro.)

Lib/test/test_iter.py Outdated Show resolved Hide resolved
Lib/test/test_iter.py Outdated Show resolved Hide resolved
Lib/test/test_iter.py Outdated Show resolved Hide resolved
@ionite34
Copy link
Contributor Author

ionite34 commented Feb 10, 2023

Just a few minor nits, but otherwise this LGTM.

I assume you did verify that the added test fails/crashes if any of the code changes are reverted? (I guess assuming you are on a system where arg eval order is such that this crash does repro.)

I have an example here that uses multiprocessing.Pool to run the tests (due to potential of crash), and all cases either fails due to the wrong value, or SystemError. Not sure if it's worth doing in the tests here since it does raise the complexity a bit due to pickling requirements.

  • This reproduces all failures for 3.11 / 3.12.0a4 for me:
Details
import builtins
import functools
import multiprocessing
import unittest


class EmptyIterClass:
    def __len__(self):
        return 0
    def __getitem__(self, i):
        raise StopIteration


def run_in_subprocess(func, *args):
    with multiprocessing.Pool(1) as pool:
        res = pool.apply_async(func, args=args)
        return res.get(timeout=2)

def get_zero():
    return 0

def run_test_iter_reduce(builtin_name, item, sentinel=None):
    it = iter(item) if sentinel is None else iter(item, sentinel)

    # Backup builtins
    builtins_dict = builtins.__dict__
    orig = {"iter": iter, "reversed": reversed}

    class CustomStr:
        def __init__(self, name, iterator):
            self.name = name
            self.iterator = iterator
        def __hash__(self):
            return hash(self.name)
        def __eq__(self, other):
            # Here we exhaust our iterator, possibly changing
            # its `it_seq` pointer to NULL
            # The `__reduce__` call should correctly get
            # the pointers after this call
            list(self.iterator)
            return other == self.name

    # del is required here
    # since only setting will result in
    # both keys existing with a hash collision
    del builtins_dict[builtin_name]
    builtins_dict[CustomStr(builtin_name, it)] = orig[builtin_name]

    return it.__reduce__()

class Tests(unittest.TestCase):
    
    def test_reduce_mutating_builtins_iter(self):
        # This is a reproducer of issue #101765
        # where iter `__reduce__` calls could lead to a segfault or SystemError
        # depending on the order of C argument evaluation, which is undefined

        types = [
            (EmptyIterClass(),),
            (bytes(3),),
            (bytearray(3),),
            ((1, 2, 3),),
            (get_zero, 0),
            (tuple[int],)  # GenericAlias
        ]

        run_iter = functools.partial(run_test_iter_reduce, "iter")
        # The returned value of `__reduce__` should not only be valid
        # but also *empty*, as `it` was exhausted during `__eq__`
        # i.e "xyz" returns (iter, ("",))

        with self.subTest(case="str"):
            self.assertEqual(
                run_in_subprocess(run_iter, "xyz"),
                (iter, ("",))
            )
        with self.subTest(case="list"):
            self.assertEqual(
                run_in_subprocess(run_iter, [1, 2, 3]),
                (iter, ([],))
            )

        # _PyEval_GetBuiltin is also called for `reversed` in a branch of
        # listiter_reduce_general
        with self.subTest(case="reversed list"):
            self.assertEqual(
                run_in_subprocess(
                    run_test_iter_reduce, 
                    "reversed",
                    reversed([*range(8)])
                ),
                (iter, ([],))
            )

        for case in types:
            with self.subTest(case=case):
                self.assertEqual(
                    run_in_subprocess(run_iter, *case),
                    (iter, ((),))
                )
            

if __name__ == "__main__":
    unittest.main()

Essentially, currently there are 3 possibilities for the tests to go before this PR:

  1. _PyEval_GetBuiltin is called before it->it_seq
  • This sets it->it_seq to NULL, which happens before the actual NULL check for it_seq, hence this path leads to SystemError
  1. _PyEval_GetBuiltin is called after it->it_seq
  • This does the same setting of it_seq to NULL, but the original pointer was already acquired. At this point, based on OS / compiler / memory / object references, the original pointer may have been freed. So this returns an inaccurate valued __reduce__ with the iterator value before we exhausted it.
  • Since the exhaustion of it_seq also DecRefs it, our __reduce__ return value is either incorrect in value (before we exhausted the iterator), or a segmentation fault if the object was freed after DecRef.

So there should be no possibility that any of these tests pass on builds before this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants