gh-135552: Make the GC clear weakrefs later. #136189

nascheme · 2025-07-01T20:44:59Z

Clear the weakrefs to unreachable objects after finalizers are called.

Issue: Segmentation fault, possibly due to a GC issue (tp_subclasses) #135552

Issue: SIGSEV in datetime.timedelta (possibly from datetime's C delta_new) #132413

Clear the weakrefs to unreachable objects after finalizers are called.

neonene · 2025-07-01T21:26:01Z

I can confirm this PR fixes the gh-132413 issue as well.

nascheme · 2025-07-01T23:02:39Z

I think this fixes (or mostly fixes) gh-91636 as well.

bedevere-bot · 2025-07-01T23:34:44Z

🤖 New build scheduled with the buildbot fleet by @nascheme for commit 12f0b5c 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136189%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

nascheme · 2025-07-02T04:24:50Z

This introduces refleaks, it seems. One of the leaking tests:
test.test_concurrent_futures.test_shutdown.ProcessPoolSpawnProcessPoolShutdownTest.test_shutdown_gh_132969_case_1
My unconfirmed suspicion is that a finalizer is now resurrecting an object via a weakref. Previously that weakref would be cleared before the finalizer is run. The multiprocessing finalizer logic seems very complicated. :-/

nascheme · 2025-07-02T06:42:49Z

This is the smallest leaking example I could make so far. Something with ProcessPoolExecutor leaking maybe.

class LeakTest(unittest.TestCase):
    @classmethod
    def _fail_task(cls, n):
        raise ValueError("failing task")

    def test_leak_case(self):
        # this leaks references
        executor = futures.ProcessPoolExecutor(
                max_workers=1,
                max_tasks_per_child=1,
                )
        f2 = executor.submit(LeakTest._fail_task, 0)
        try:
            f2.result()
        except ValueError:
            pass

        # Ensure that the executor cleans up after called
        # shutdown with wait=False
        executor_manager_thread = executor._executor_manager_thread
        executor.shutdown(wait=False)
        time.sleep(0.2)
        executor_manager_thread.join()

    def test_leak_case2(self):
        # this does not leak
        with futures.ProcessPoolExecutor(
                max_workers=1,
                max_tasks_per_child=1,
                ) as executor:
            f2 = executor.submit(LeakTest._fail_task, 0)
            try:
                f2.result()
            except ValueError:
                pass

neonene · 2025-07-02T12:52:14Z

Other leaking examples (on Windows):

1. test_logging:

import logging
import logging.config
import logging.handlers
from multiprocessing import Queue, Manager

class ConfigDictTest(unittest.TestCase):
    def test_multiprocessing_queues_XXX(self):
        config = {
            'version': 1,
            'handlers' : {
                'spam' : {
                    'class': 'logging.handlers.QueueHandler',
                    'queue': Manager().Queue()  ,         # Leak
                    # 'queue': Manager().JoinableQueue()  # Leak
                    # 'queue': Queue(),                   # No leak

                },
            },
            'root': {'handlers': ['spam']}
        }
        logger = logging.getLogger()
        logging.config.dictConfig(config)
        while logger.handlers:
            h = logger.handlers[0]
            logger.removeHandler(h)
            h.close()

2. test_interpreters.test_api:

import contextlib
import threading
import types
from concurrent import interpreters

def func():
    raise Exception('spam!')

@contextlib.contextmanager
def captured_thread_exception():
    ctx = types.SimpleNamespace(caught=None)
    def excepthook(args):
        ctx.caught = args
    orig_excepthook = threading.excepthook
    threading.excepthook = excepthook
    try:
        yield ctx
    finally:
        threading.excepthook = orig_excepthook

class TestInterpreterCall(unittest.TestCase):
    def test_call_in_thread_XXX(self):
        interp = interpreters.create()
        call = (interp._call, interp.call)[1]   # 0: No leak, 1: Leak
        with captured_thread_exception() as _:
            t = threading.Thread(target=call, args=(interp, func, (), {}))
            t.start()
            t.join()

nascheme · 2025-07-02T19:32:02Z

The majority (maybe all) of these leaks are caused by the WeakValueDictionary used as multiprocessing.util._afterfork_registry. That took some digging to find. I'm not yet sure of a good fix for this. Explicitly cleaning the dead weak references from the .data dict works but it not too elegant.

This avoids breaking tests while fixing the issue with tp_subclasses. In the long term, it would be better to defer the clear of all weakrefs, not just the ones referring to classes. However, that is a more distruptive change and would seem to have a higher chance of breaking user code. So, it would not be something to do in a bugfix release.

nascheme · 2025-07-03T01:01:27Z

The majority (maybe all) of these leaks are caused by the WeakValueDictionary used as multiprocessing.util._afterfork_registry. That took some digging to find. I'm not yet sure of a good fix for this. Explicitly cleaning the dead weak references from the .data dict works but it not too elegant.

Nope, that doesn't fix all the leaks. And having to explicitly clean the weakrefs from the WeakValueDictionary really shouldn't be needed, I think. The KeyedRef class uses a callback and so they should be cleaned from the dict when the referred value dies. So, I'm not exactly sure what's going on there.

For the purposes of having a fix that we can backport (should probably be backported to all maintained Python versions), a less disruptive fix would be better. To that end, I've changed this PR to only defer clearing weakrefs to class objects. That fixes the tp_subclasses bug but should be less likely to break currently working code.

bedevere-bot · 2025-07-03T01:27:04Z

🤖 New build scheduled with the buildbot fleet by @nascheme for commit 2f3daba 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136189%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

Make the GC clear weakrefs later.

42abb05

Clear the weakrefs to unreachable objects after finalizers are called.

bedevere-app bot mentioned this pull request Jul 1, 2025

Segmentation fault, possibly due to a GC issue (tp_subclasses) #135552

Open

Remove inaccurate comment.

17a4f9e

Run clear_weakrefs() with world stopped.

12f0b5c

nascheme mentioned this pull request Jul 1, 2025

Assertion failure when func_repr is called on an already tp_clear-ed object #91636

Open

nascheme requested a review from pablogsal July 1, 2025 23:33

nascheme added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 1, 2025

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 1, 2025

neonene mentioned this pull request Jul 2, 2025

gh-132413: Clear weakref to _datetime after modules are finalized #136152

Draft

nascheme added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 3, 2025

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-135552: Make the GC clear weakrefs later. #136189

gh-135552: Make the GC clear weakrefs later. #136189

nascheme commented Jul 1, 2025 •

edited

Loading

Uh oh!

neonene commented Jul 1, 2025

Uh oh!

nascheme commented Jul 1, 2025

Uh oh!

bedevere-bot commented Jul 1, 2025

Uh oh!

nascheme commented Jul 2, 2025

Uh oh!

nascheme commented Jul 2, 2025

Uh oh!

neonene commented Jul 2, 2025

Uh oh!

nascheme commented Jul 2, 2025

Uh oh!

nascheme commented Jul 3, 2025

Uh oh!

bedevere-bot commented Jul 3, 2025

Uh oh!

Uh oh!

Uh oh!

gh-135552: Make the GC clear weakrefs later. #136189

Are you sure you want to change the base?

gh-135552: Make the GC clear weakrefs later. #136189

Conversation

nascheme commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neonene commented Jul 1, 2025

Uh oh!

nascheme commented Jul 1, 2025

Uh oh!

bedevere-bot commented Jul 1, 2025

Uh oh!

nascheme commented Jul 2, 2025

Uh oh!

nascheme commented Jul 2, 2025

Uh oh!

neonene commented Jul 2, 2025

Uh oh!

nascheme commented Jul 2, 2025

Uh oh!

nascheme commented Jul 3, 2025

Uh oh!

bedevere-bot commented Jul 3, 2025

Uh oh!

Uh oh!

nascheme commented Jul 1, 2025 •

edited

Loading