gh-101765: Fix SystemError / segfault from undefined behavior in iter `reduce` #101769

ionite34 · 2023-02-10T01:26:22Z

Summary

Fixes potential segmentation fault or SystemError when __reduce__ is called on iter objects when the key of the hash of "iter" within __builtins__.__dict__ is a custom object that executes arbitrary code within __eq__, mutating the iter object and causing illegal memory access or SystemError (based on C argument evaluation order, which is undefined behavior).

Affected methods

iterobject.c
- iter_reduce
- calliter_reduce
listobject.c
- listiter_reduce_general
bytearrayobject.c
- bytearrayiter_reduce
bytesobject.c
- striter_reduce
tupleobject.c
- tupleiter_reduce
genericaliasobject.c
- ga_iter_reduce

This PR also fixes a compounded issue where currently genericaliasobject.ga_iter_reduce does not handle empty iterators at all and has no NULL check

Python 3.11.0 (main, Oct 26 2022, 10:14:06) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> x = tuple[int]
>>>
>>> it = iter(x)
>>> _ = list(it)
>>>
>>> it.__reduce__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: NULL object passed to Py_BuildValue

Along with moving the evaluation of _PyEval_GetBuiltin for potential side effects, this also adds handling of the NULL case (like the other iter_reduce functions have:

Python 3.12.0a5+ (heads/fix-reduce-dirty:93854e172e, Feb 10 2023, 13:41:45) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> x = tuple[int]
>>>
>>> it = iter(x)
>>> _ = list(it)
>>>
>>> it.__reduce__()
(<built-in function iter>, ((),))

Linked Issue

Issue: iter __reduce__ can segfault if accessing __builtins__.__dict__['iter'] mutates the iter object #101765

…e effects

…n side effects

… effects

…de effects

JelleZijlstra · 2023-02-10T05:44:00Z

Objects/bytearrayobject.c

+
+    /* _PyEval_GetBuiltin can invoke arbitrary code.
+     * calls must be *before* access of `it` pointers,
+     * since C/C++ parameter eval order is undefined.


Suggested change

* since C/C++ parameter eval order is undefined.

* since C parameter eval order is undefined.

This is C code, so we don't care about C++.

removed C++ references in
45522c6

JelleZijlstra · 2023-02-10T05:44:53Z

Objects/listobject.c

@@ -3443,19 +3443,26 @@ static PyObject *
 listiter_reduce_general(void *_it, int forward)
 {
    PyObject *list;
+    PyObject *iter;


You can just declare these within the block where they are used.

Moved the declarations to the if blocks in 049a8dd

…o potential side effects

…r_reduce

Lib/test/test_iter.py

Objects/listobject.c

Objects/unicodeobject.c

- Added some comments for dict del usages - Switched to `__builtin__` instead of conditional `__dict__` access - Use kwargs for improved readability

…ist conditional branch

…ters

carljm

Just a few minor nits, but otherwise this LGTM.

I assume you did verify that the added test fails/crashes if any of the code changes are reverted? (I guess assuming you are on a system where arg eval order is such that this crash does repro.)

Lib/test/test_iter.py

Misc/NEWS.d/next/Core and Builtins/2023-02-10-07-21-47.gh-issue-101765.MO5LlC.rst

ionite34 · 2023-02-10T21:23:02Z

Just a few minor nits, but otherwise this LGTM.

I assume you did verify that the added test fails/crashes if any of the code changes are reverted? (I guess assuming you are on a system where arg eval order is such that this crash does repro.)

I have an example here that uses multiprocessing.Pool to run the tests (due to potential of crash), and all cases either fails due to the wrong value, or SystemError. Not sure if it's worth doing in the tests here since it does raise the complexity a bit due to pickling requirements.

This reproduces all failures for 3.11 / 3.12.0a4 for me:

Details

import builtins
import functools
import multiprocessing
import unittest


class EmptyIterClass:
    def __len__(self):
        return 0
    def __getitem__(self, i):
        raise StopIteration


def run_in_subprocess(func, *args):
    with multiprocessing.Pool(1) as pool:
        res = pool.apply_async(func, args=args)
        return res.get(timeout=2)

def get_zero():
    return 0

def run_test_iter_reduce(builtin_name, item, sentinel=None):
    it = iter(item) if sentinel is None else iter(item, sentinel)

    # Backup builtins
    builtins_dict = builtins.__dict__
    orig = {"iter": iter, "reversed": reversed}

    class CustomStr:
        def __init__(self, name, iterator):
            self.name = name
            self.iterator = iterator
        def __hash__(self):
            return hash(self.name)
        def __eq__(self, other):
            # Here we exhaust our iterator, possibly changing
            # its `it_seq` pointer to NULL
            # The `__reduce__` call should correctly get
            # the pointers after this call
            list(self.iterator)
            return other == self.name

    # del is required here
    # since only setting will result in
    # both keys existing with a hash collision
    del builtins_dict[builtin_name]
    builtins_dict[CustomStr(builtin_name, it)] = orig[builtin_name]

    return it.__reduce__()

class Tests(unittest.TestCase):
    
    def test_reduce_mutating_builtins_iter(self):
        # This is a reproducer of issue #101765
        # where iter `__reduce__` calls could lead to a segfault or SystemError
        # depending on the order of C argument evaluation, which is undefined

        types = [
            (EmptyIterClass(),),
            (bytes(3),),
            (bytearray(3),),
            ((1, 2, 3),),
            (get_zero, 0),
            (tuple[int],)  # GenericAlias
        ]

        run_iter = functools.partial(run_test_iter_reduce, "iter")
        # The returned value of `__reduce__` should not only be valid
        # but also *empty*, as `it` was exhausted during `__eq__`
        # i.e "xyz" returns (iter, ("",))

        with self.subTest(case="str"):
            self.assertEqual(
                run_in_subprocess(run_iter, "xyz"),
                (iter, ("",))
            )
        with self.subTest(case="list"):
            self.assertEqual(
                run_in_subprocess(run_iter, [1, 2, 3]),
                (iter, ([],))
            )

        # _PyEval_GetBuiltin is also called for `reversed` in a branch of
        # listiter_reduce_general
        with self.subTest(case="reversed list"):
            self.assertEqual(
                run_in_subprocess(
                    run_test_iter_reduce, 
                    "reversed",
                    reversed([*range(8)])
                ),
                (iter, ([],))
            )

        for case in types:
            with self.subTest(case=case):
                self.assertEqual(
                    run_in_subprocess(run_iter, *case),
                    (iter, ((),))
                )
            

if __name__ == "__main__":
    unittest.main()

Essentially, currently there are 3 possibilities for the tests to go before this PR:

_PyEval_GetBuiltin is called before it->it_seq

This sets it->it_seq to NULL, which happens before the actual NULL check for it_seq, hence this path leads to SystemError

_PyEval_GetBuiltin is called after it->it_seq

This does the same setting of it_seq to NULL, but the original pointer was already acquired. At this point, based on OS / compiler / memory / object references, the original pointer may have been freed. So this returns an inaccurate valued __reduce__ with the iterator value before we exhausted it.
Since the exhaustion of it_seq also DecRefs it, our __reduce__ return value is either incorrect in value (before we exhausted the iterator), or a segmentation fault if the object was freed after DecRef.

So there should be no possibility that any of these tests pass on builds before this PR.

Fix undefined behavior in listiter_reduce from _PyEval_GetBuiltin sid…

f256afe

…e effects

bedevere-bot added the awaiting review label Feb 10, 2023

bedevere-bot mentioned this pull request Feb 10, 2023

iter __reduce__ can segfault if accessing __builtins__.__dict__['iter'] mutates the iter object #101765

Open

ionite34 added 4 commits February 9, 2023 20:34

Update comment to not mention __eq__

6b4faad

Fix undefined behavior in iter_reduce and calliter_reduce

a6d6211

Update listiter_reduce_general comment

4ccf427

Fix undefined behavior in bytearrayiter_reduce from _PyEval_GetBuilti…

c2c9cfb

…n side effects

arhadthedev added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Feb 10, 2023

ionite34 added 2 commits February 10, 2023 00:26

Fix undefined behavior in striter_reduce from _PyEval_GetBuiltin side…

e2989d9

… effects

Fix undefined behavior in tupleiter_reduce from _PyEval_GetBuiltin si…

71960a8

…de effects

JelleZijlstra self-requested a review February 10, 2023 05:43

JelleZijlstra reviewed Feb 10, 2023

View changes

ionite34 and others added 9 commits February 10, 2023 01:04

Move iter call in unicodeiter_reduce before it pointer access due t…

efa0540

…o potential side effects

Add iter reduce tests for issue python#101765

c5abb14

Remove C++ reference in comments

45522c6

Remove C++ reference in comments

4f5fc19

Move builtin declarations inside if blocks

049a8dd

Move _PyEval_GetBuiltin before gi checks, add gi NULL check in ga_ite…

ef4f955

…r_reduce

Update iter reduce mutating tests for generic alias

7d4afb0

📜🤖 Added by blurb_it.

8e4418d

Fix backticks format for news

d8ced8e

ionite34 marked this pull request as ready for review February 10, 2023 07:29

carljm reviewed Feb 10, 2023

View changes

Lib/test/test_iter.py Outdated Show resolved Hide resolved

Lib/test/test_iter.py Outdated Show resolved Hide resolved

Objects/listobject.c Show resolved Hide resolved

Objects/unicodeobject.c Show resolved Hide resolved

JelleZijlstra self-requested a review February 10, 2023 17:29

ionite34 added 4 commits February 10, 2023 12:46

Refactor iter reduce builtins mutation tests

178b8ea

- Added some comments for dict del usages - Switched to `__builtin__` instead of conditional `__dict__` access - Use kwargs for improved readability

Update iter mutating builtins test to include reversed iterator for l…

49ba8c3

…ist conditional branch

Add comment in unicodeiter_reduce for moving iter call before it poin…

93854e1

…ters

Change test __builtins__ to builtins import

98ec3c6

carljm approved these changes Feb 10, 2023

View changes

Lib/test/test_iter.py Outdated Show resolved Hide resolved

Lib/test/test_iter.py Outdated Show resolved Hide resolved

Lib/test/test_iter.py Outdated Show resolved Hide resolved

Misc/NEWS.d/next/Core and Builtins/2023-02-10-07-21-47.gh-issue-101765.MO5LlC.rst Outdated Show resolved Hide resolved

bedevere-bot added awaiting core review and removed awaiting review labels Feb 10, 2023

ionite34 added 2 commits February 10, 2023 15:22

Change NEWS blurb phrasing

e661495

Update iter reduce mutating builtins test comments and simplify logic

19ab9c6

gh-101765: Fix SystemError / segfault from undefined behavior in iter `reduce` #101769

gh-101765: Fix SystemError / segfault from undefined behavior in iter `reduce` #101769

ionite34 commented Feb 10, 2023 •

edited

JelleZijlstra Feb 10, 2023

ionite34 Feb 10, 2023

JelleZijlstra Feb 10, 2023

ionite34 Feb 10, 2023

carljm left a comment

ionite34 commented Feb 10, 2023 •

edited

	* since C/C++ parameter eval order is undefined.
	* since C parameter eval order is undefined.

gh-101765: Fix SystemError / segfault from undefined behavior in iter __reduce__ #101769

Are you sure you want to change the base?

gh-101765: Fix SystemError / segfault from undefined behavior in iter __reduce__ #101769

Conversation

ionite34 commented Feb 10, 2023 • edited

Summary

Affected methods

Linked Issue

JelleZijlstra Feb 10, 2023

Choose a reason for hiding this comment

ionite34 Feb 10, 2023

Choose a reason for hiding this comment

JelleZijlstra Feb 10, 2023

Choose a reason for hiding this comment

ionite34 Feb 10, 2023

Choose a reason for hiding this comment

carljm left a comment

Choose a reason for hiding this comment

ionite34 commented Feb 10, 2023 • edited

gh-101765: Fix SystemError / segfault from undefined behavior in iter `reduce` #101769

gh-101765: Fix SystemError / segfault from undefined behavior in iter `reduce` #101769

ionite34 commented Feb 10, 2023 •

edited

ionite34 commented Feb 10, 2023 •

edited