Skip to content

gh-135552: Make the GC clear weakrefs later. #136189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

nascheme
Copy link
Member

@nascheme nascheme commented Jul 1, 2025

Clear the weakrefs to unreachable objects after finalizers are called.
@neonene
Copy link
Contributor

neonene commented Jul 1, 2025

I can confirm this PR fixes the gh-132413 issue as well.

@nascheme
Copy link
Member Author

nascheme commented Jul 1, 2025

I think this fixes (or mostly fixes) gh-91636 as well.

@nascheme nascheme requested a review from pablogsal July 1, 2025 23:33
@nascheme nascheme added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 1, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @nascheme for commit 12f0b5c 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136189%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 1, 2025
@nascheme
Copy link
Member Author

nascheme commented Jul 2, 2025

This introduces refleaks, it seems. One of the leaking tests:
test.test_concurrent_futures.test_shutdown.ProcessPoolSpawnProcessPoolShutdownTest.test_shutdown_gh_132969_case_1
My unconfirmed suspicion is that a finalizer is now resurrecting an object via a weakref. Previously that weakref would be cleared before the finalizer is run. The multiprocessing finalizer logic seems very complicated. :-/

@nascheme
Copy link
Member Author

nascheme commented Jul 2, 2025

This is the smallest leaking example I could make so far. Something with ProcessPoolExecutor leaking maybe.

class LeakTest(unittest.TestCase):
    @classmethod
    def _fail_task(cls, n):
        raise ValueError("failing task")

    def test_leak_case(self):
        # this leaks references
        executor = futures.ProcessPoolExecutor(
                max_workers=1,
                max_tasks_per_child=1,
                )
        f2 = executor.submit(LeakTest._fail_task, 0)
        try:
            f2.result()
        except ValueError:
            pass

        # Ensure that the executor cleans up after called
        # shutdown with wait=False
        executor_manager_thread = executor._executor_manager_thread
        executor.shutdown(wait=False)
        time.sleep(0.2)
        executor_manager_thread.join()

    def test_leak_case2(self):
        # this does not leak
        with futures.ProcessPoolExecutor(
                max_workers=1,
                max_tasks_per_child=1,
                ) as executor:
            f2 = executor.submit(LeakTest._fail_task, 0)
            try:
                f2.result()
            except ValueError:
                pass

@neonene
Copy link
Contributor

neonene commented Jul 2, 2025

Other leaking examples (on Windows):

1. test_logging:
import logging
import logging.config
import logging.handlers
from multiprocessing import Queue, Manager

class ConfigDictTest(unittest.TestCase):
    def test_multiprocessing_queues_XXX(self):
        config = {
            'version': 1,
            'handlers' : {
                'spam' : {
                    'class': 'logging.handlers.QueueHandler',
                    'queue': Manager().Queue()  ,         # Leak
                    # 'queue': Manager().JoinableQueue()  # Leak
                    # 'queue': Queue(),                   # No leak

                },
            },
            'root': {'handlers': ['spam']}
        }
        logger = logging.getLogger()
        logging.config.dictConfig(config)
        while logger.handlers:
            h = logger.handlers[0]
            logger.removeHandler(h)
            h.close()
2. test_interpreters.test_api:
import contextlib
import threading
import types
from concurrent import interpreters

def func():
    raise Exception('spam!')

@contextlib.contextmanager
def captured_thread_exception():
    ctx = types.SimpleNamespace(caught=None)
    def excepthook(args):
        ctx.caught = args
    orig_excepthook = threading.excepthook
    threading.excepthook = excepthook
    try:
        yield ctx
    finally:
        threading.excepthook = orig_excepthook

class TestInterpreterCall(unittest.TestCase):
    def test_call_in_thread_XXX(self):
        interp = interpreters.create()
        call = (interp._call, interp.call)[1]   # 0: No leak, 1: Leak
        with captured_thread_exception() as _:
            t = threading.Thread(target=call, args=(interp, func, (), {}))
            t.start()
            t.join()

@nascheme
Copy link
Member Author

nascheme commented Jul 2, 2025

The majority (maybe all) of these leaks are caused by the WeakValueDictionary used as multiprocessing.util._afterfork_registry. That took some digging to find. I'm not yet sure of a good fix for this. Explicitly cleaning the dead weak references from the .data dict works but it not too elegant.

This avoids breaking tests while fixing the issue with tp_subclasses. In
the long term, it would be better to defer the clear of all weakrefs,
not just the ones referring to classes.  However, that is a more
distruptive change and would seem to have a higher chance of breaking
user code.  So, it would not be something to do in a bugfix release.
@nascheme
Copy link
Member Author

nascheme commented Jul 3, 2025

The majority (maybe all) of these leaks are caused by the WeakValueDictionary used as multiprocessing.util._afterfork_registry. That took some digging to find. I'm not yet sure of a good fix for this. Explicitly cleaning the dead weak references from the .data dict works but it not too elegant.

Nope, that doesn't fix all the leaks. And having to explicitly clean the weakrefs from the WeakValueDictionary really shouldn't be needed, I think. The KeyedRef class uses a callback and so they should be cleaned from the dict when the referred value dies. So, I'm not exactly sure what's going on there.

For the purposes of having a fix that we can backport (should probably be backported to all maintained Python versions), a less disruptive fix would be better. To that end, I've changed this PR to only defer clearing weakrefs to class objects. That fixes the tp_subclasses bug but should be less likely to break currently working code.

@nascheme nascheme added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 3, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @nascheme for commit 2f3daba 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136189%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants