Skip to content

Conversation

@bruunotrindade
Copy link
Contributor

@bruunotrindade bruunotrindade commented Dec 31, 2025

Description

This PR fixes a critical recursion bug in DjangoWorkerFixup.__init__ that causes infinite recursion during worker instantiation when using Django with Celery, and causes #10042

Root cause

DjangoWorkerFixup.__init__ was creating a WorkController instance via the worker or WorkController(app) pattern. This created a circular dependency where:

  • WorkController.__init__ triggers the worker_init signal
  • DjangoFixup.on_worker_init creates/accesses DjangoWorkerFixup
  • DjangoWorkerFixup.__init__ tries to create another WorkController
  • Infinite recursion ensues

More details of these comments:
#10023 (comment)
#10023 (comment)

Solution

Remove the automatic WorkController instantiation from __init__ and instead set the worker instance via the on_worker_init callback, breaking the circular dependency.

Changes

django.py

  • Removed worker parameter from DjangoWorkerFixup.__init__
  • Set worker = None as a class attribute with an explanatory comment
  • Added None check before accessing worker.pool_cls to prevent AttributeError
  • Worker instance is now set via DjangoFixup.on_worker_init callback

test_django.py

  • Updated fixup_context helper to properly set worker mock for all tests
  • Added comprehensive regression test test_no_recursive_worker_instantiation that verifies no WorkController is created when it shouldn't be

Testing

  • All existing unit tests pass
  • Added regression test that validates the fix
  • Pre-commit hooks pass (flake8, mypy, isort, etc.)
  • Test coverage maintained

This test case is a way to prove that the recursion issue happens on v5.6.1:

    def test_worker_controller_recursion_bug(self, patching):
        """Test that creating WorkController doesn't cause infinite recursion.
        
        Regression test for the issue where:
        1. DjangoFixup.install() registers on_import_modules and on_worker_init signals
        2. WorkController.__init__ is called (from bin.worker)
        3. on_import_modules callback runs and accesses worker_fixup property
        4. worker_fixup property creates DjangoWorkerFixup without a worker
        5. DjangoWorkerFixup.__init__ creates a NEW WorkController(app)
        6. This new WorkController sends worker_init signal
        7. on_worker_init callback runs and accesses worker_fixup again
        8. Since worker_fixup is still None (step 4 hasn't finished), it creates another
        9. Infinite recursion -> RecursionError
        
        Additionally, this causes pool_cls to be a string instead of a class,
        leading to AttributeError when trying to access pool_cls.__module__
        """
        from celery import signals
        from celery.worker import WorkController
        
        patching('celery.fixups.django.symbol_by_name')
        patching('celery.fixups.django.import_module')
        patching.modules('django', 'django.db', 'django.core.checks')
        patching.setenv('DJANGO_SETTINGS_MODULE', 'settings')
        
        # Create and install DjangoFixup
        django_fixup = DjangoFixup(self.app)
        django_fixup.install()
        
        # Track how many times WorkController is instantiated
        original_init = WorkController.__init__
        instantiation_count = {'count': 0}
        recursion_errors = []
        attribute_errors = []
        
        def tracked_init(self_worker, *args, **kwargs):
            instantiation_count['count'] += 1
            # Prevent actual infinite recursion in test by limiting
            if instantiation_count['count'] > 5:
                raise RecursionError("Too many WorkController instantiations detected")
            return original_init(self_worker, *args, **kwargs)
        
        # Capture signal errors
        def signal_error_handler(sender, **kwargs):
            exc = kwargs.get('exception')
            if isinstance(exc, RecursionError):
                recursion_errors.append(exc)
            elif isinstance(exc, AttributeError) and "'str' object has no attribute '__module__'" in str(exc):
                attribute_errors.append(exc)
        
        signals.worker_init.connect(signal_error_handler, weak=False)
        
        try:
            with patch.object(WorkController, '__init__', tracked_init):
                # Simulate what happens in celery.bin.worker
                try:
                    worker = WorkController(
                        app=self.app,
                        hostname='testworker@localhost',
                        pool_cls='solo',
                        loglevel='INFO',
                    )
                except (RecursionError, AttributeError, TypeError):
                    # These exceptions may bubble up or be caught by signal handlers
                    pass
        finally:
            signals.worker_init.disconnect(signal_error_handler)
        
        # Verify that multiple WorkController instances were created (recursion happened)
        assert instantiation_count['count'] > 1, \
            f"Expected multiple WorkController instantiations due to recursion bug, got {instantiation_count['count']}"
        
        # The test confirms the bug exists: multiple WorkController instantiations occur
        # when only one should be created

Related Issues

#10023 #10023

- Remove worker parameter from DjangoWorkerFixup.__init__
- Set worker via on_worker_init callback instead
- Add None check for worker.pool_cls to prevent AttributeError
- Add regression test to prevent future recursion bugs

Fixes recursive instantiation issue where DjangoWorkerFixup(app)
would create WorkController(app), which in turn would create another
DjangoWorkerFixup, leading to infinite recursion.

The worker instance is now properly set via the on_worker_init signal
callback, avoiding the circular dependency.
@bruunotrindade bruunotrindade changed the title Fixes #10042 - Recursive WorkController instantiation in DjangoWorkerFixup + AttributeError when pool_cls is a string Fix recursive WorkController instantiation in DjangoWorkerFixup + AttributeError when pool_cls is a string Dec 31, 2025
@auvipy auvipy requested review from auvipy and Copilot January 1, 2026 04:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical recursion bug in DjangoWorkerFixup.__init__ that caused infinite recursion during worker instantiation when using Django with Celery. The root cause was that DjangoWorkerFixup.__init__ was creating a new WorkController instance, which triggered the worker_init signal, leading to recursive DjangoWorkerFixup instantiation.

The solution removes the automatic WorkController instantiation from DjangoWorkerFixup.__init__ and instead sets the worker instance via the DjangoFixup.on_worker_init callback, breaking the circular dependency. Additionally, the PR adds a None check before accessing worker.pool_cls to prevent AttributeError when the worker hasn't been set yet.

Key Changes

  • Removed the worker parameter from DjangoWorkerFixup.__init__ signature
  • Added a class attribute worker = None with an explanatory comment
  • Added None check before accessing worker.pool_cls.__module__ to safely handle cases where worker isn't set

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
celery/fixups/django.py Removed worker parameter from DjangoWorkerFixup.__init__, set worker = None as class attribute, and added None check in _close_database() to prevent AttributeError
t/unit/fixups/test_django.py Updated fixup_context helper to set worker mock for all tests, simplified test_install, and added regression test test_no_recursive_worker_instantiation

@codecov
Copy link

codecov bot commented Jan 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.42%. Comparing base (21dbc73) to head (daf54c7).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10045      +/-   ##
==========================================
+ Coverage   88.41%   88.42%   +0.01%     
==========================================
  Files         153      153              
  Lines       19363    19368       +5     
  Branches     2226     2227       +1     
==========================================
+ Hits        17120    17127       +7     
+ Misses       1943     1941       -2     
  Partials      300      300              
Flag Coverage Δ
unittests 88.39% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to add additional Integration tests for this?

@maycuatroi1
Copy link
Contributor

Great fix! This correctly addresses the root cause of the recursion issue.

Suggestions

1. Similar patterns in other files

Copilot identified 2 other locations with similar pool_cls.__module__ access that might benefit from defensive checks:

  • celery/contrib/testing/worker.py line 45:

    if self.pool_cls.__module__.split('.')[-1] == 'prefork':
  • celery/worker/components.py line 197:

    if w.pool_cls.__module__.endswith(('gevent', 'eventlet')):

While your fix ensures pool_cls will always be a class (not string) in normal flow, adding similar defensive checks could prevent future edge cases.

2. Integration test idea

For the integration test that @auvipy requested, here's a possible approach based on existing patterns in t/integration/test_worker.py:

import subprocess
import pytest

def test_django_fixup_no_recursion():
    """Test that Django fixup doesn't cause RecursionError on worker startup."""
    # Create a minimal Django settings module for testing
    result = subprocess.run(
        ["celery", "-A", "some_django_app", "worker", "--pool=solo", "-c", "1"],
        capture_output=True,
        timeout=10,
        env={**os.environ, "DJANGO_SETTINGS_MODULE": "test_settings"}
    )
    
    output = result.stderr.decode('utf-8')
    # Verify no RecursionError or AttributeError related to pool_cls
    assert "RecursionError" not in output
    assert "'str' object has no attribute '__module__'" not in output

This would require a minimal Django project setup in the test fixtures.


I closed my PR #10043 in favor of this one since it fixes the root cause properly. 👍

@bruunotrindade
Copy link
Contributor Author

@maycuatroi1 thanks! Actually, when I was implementing the test case, I realized that pool_cls could still be a string depending on the pool, then I kept what you had done in your PR to properly check if it's a string.

@auvipy I think I figured out why you were not able to reproduce the issue. When I tried to setup the integration test, I wasn't able to see the exceptions, even though they were happening (it was evident using debug). After hours trying to understand what was missing, I realized that a simple logging.info was making that exception visible. That's the principle of the integration test - checking the log output. That test case is supposed to break on v5.6.1.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

@maycuatroi1
Copy link
Contributor

@bruunotrindade I can help fix the failing CI (unit tests on 3.10+) and add the defensive checks for other pool_cls.__module__ locations. Just give me push access to your branch or I can open commits on a fork. Let me know!

@maycuatroi1
Copy link
Contributor

Also, I can test this fix on my staging environment for the issue #10042.

Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix the unit test failure

maycuatroi1 added a commit to maycuatroi1/celery that referenced this pull request Jan 3, 2026
Extends the fix from celery#10045 by adding defensive checks for when
pool_cls is a string instead of a class in two additional locations:

- celery/contrib/testing/worker.py: TestWorkController.__init__
- celery/worker/components.py: Beat.create

This prevents AttributeError: 'str' object has no attribute '__module__'
when pool_cls is passed as a string (e.g., 'prefork', 'gevent', 'eventlet').

Also adds unit tests to verify the defensive checks work correctly.
@bruunotrindade bruunotrindade requested a review from auvipy January 3, 2026 23:59
@bruunotrindade
Copy link
Contributor Author

@auvipy I fixed the tests. I don't know why, but the issue was on tox.ini

@auvipy auvipy added this to the 5.6.x milestone Jan 4, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

@auvipy auvipy merged commit 9d6ab11 into celery:main Jan 4, 2026
377 checks passed
auvipy added a commit that referenced this pull request Jan 5, 2026
…ule__ (#10048)

* Fix recursive WorkController instantiation in DjangoWorkerFixup

- Remove worker parameter from DjangoWorkerFixup.__init__
- Set worker via on_worker_init callback instead
- Add None check for worker.pool_cls to prevent AttributeError
- Add regression test to prevent future recursion bugs

Fixes recursive instantiation issue where DjangoWorkerFixup(app)
would create WorkController(app), which in turn would create another
DjangoWorkerFixup, leading to infinite recursion.

The worker instance is now properly set via the on_worker_init signal
callback, avoiding the circular dependency.

* fix: improve prefork detection and add integration test

* fix: Copilot recommendations

* Add defensive checks for pool_cls.__module__ in additional locations

Extends the fix from #10045 by adding defensive checks for when
pool_cls is a string instead of a class in two additional locations:

- celery/contrib/testing/worker.py: TestWorkController.__init__
- celery/worker/components.py: Beat.create

This prevents AttributeError: 'str' object has no attribute '__module__'
when pool_cls is passed as a string (e.g., 'prefork', 'gevent', 'eventlet').

Also adds unit tests to verify the defensive checks work correctly.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add unit tests for TestWorkController with string pool_cls

Add tests to verify defensive handling of string pool_cls in
TestWorkController.__init__, addressing reviewer feedback.

---------

Co-authored-by: Asif Saif Uddin {"Auvi":"অভি"} <auvipy@gmail.com>
@liammonahan
Copy link

@bruunotrindade I just wanted to say thank you so much for submitting this fix! It made a big difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants