Skip to content

Conversation

@danlamanna
Copy link
Contributor

This reverts commit f0c9b40 (cc @whtsky @auvipy)

This reverts PR #6134 and stops using the close_if_unusable_or_obsolete API since there are edge cases where it's unable to detect if a connection if actually unusable, see the relevant Django issue which is marked as wontfix: https://code.djangoproject.com/ticket/30646 (see also https://forum.djangoproject.com/t/close-if-unusable-or-obsolete-fails-to-close-unusable-connections/41900).

A celery-specific way to introduce one of these edge cases is to run a task with a timeout that gets interrupted via SIGALRM while a query is executing. The connection will be unusable but considered usable by Django, causing the next invocation of a task on that worker to fail.

Since this is an optimization that can't be reliably used, Celery ought to close the connection after each task instead of trying to manage connections in a way similar to how the Django application does.

Description

Sorry for the verbose timeline but this optimization has a bit of a history in Celery.

Timeline

June 2017

#4116 is filed citing that Celery doesn't reuse database connections the same way Django does

Sept 2017

#4292 is added to start reusing Django connections via close_if_unusable_or_obsolete

July 2018 - Apr 2019

#4878, #5483, and celery/django-celery-results#58 crop up as a result of #4292

May 2019

The optimization gets reverted in #5515

June 2020

#6134 reintroduces this optimization to attempt to fix #4116 once and for all

@danlamanna danlamanna force-pushed the always-close-django-connections branch 2 times, most recently from d4dac7e to 189b278 Compare July 21, 2025 17:37
@auvipy auvipy requested review from auvipy and Copilot July 22, 2025 03:35

This comment was marked as outdated.

@codecov
Copy link

codecov bot commented Jul 22, 2025

Codecov Report

❌ Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 78.63%. Comparing base (6dcecbe) to head (df224a7).
⚠️ Report is 68 commits behind head on main.

Files with missing lines Patch % Lines
celery/fixups/django.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9824   +/-   ##
=======================================
  Coverage   78.62%   78.63%           
=======================================
  Files         153      153           
  Lines       19199    19197    -2     
  Branches     2547     2546    -1     
=======================================
  Hits        15095    15095           
+ Misses       3811     3809    -2     
  Partials      293      293           
Flag Coverage Δ
unittests 78.61% <66.66%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can revert this. but what will be consequences? can we add additional integration tests to make sure there will be no regression at all?

@auvipy auvipy added this to the 5.6.0 milestone Jul 22, 2025
@danlamanna
Copy link
Contributor Author

I can revert this. but what will be consequences?

Before the connection was only closed based on a heuristic, now it will always be closed at the end of the task.

can we add additional integration tests to make sure there will be no regression at all?

I can't see a way to add an integration test without adding postgres and a django application. If you're just looking for a big sign that steers people away from re-implementing something similar again, I can add a unit test that asserts close_if_unusable_or_obsolete isn't called, since using that API is the fundamental problem here. Does that work?

@auvipy
Copy link
Member

auvipy commented Jul 22, 2025

yeah sure, that would be great

@danlamanna
Copy link
Contributor Author

yeah sure, that would be great

All set.

@auvipy auvipy requested a review from Copilot July 23, 2025 03:36

This comment was marked as resolved.

@auvipy
Copy link
Member

auvipy commented Aug 7, 2025

I suggest try part of the following test cases to improve test coverage.

`

--- Edge Case: Simulate SIGALRM or OSError on DB close ---

def test__close_database_handles_oserror(self):
with self.fixup_context(self.app) as (f, _, _):
conns = [Mock()]
# Simulate an OSError (e.g., interrupted system call/SIGALRM)
conns[0].close.side_effect = OSError('interrupted system call')
f.DatabaseError = OSError
f.interface_errors = ()
f._db.connections = Mock()
f._db.connections.all.return_value = conns
# Should not raise
f._close_database(force=True)
conns[0].close.assert_called_with()

--- Multi-DB: All connections are forcibly closed ---

def test__close_database_multi_db(self):
with self.fixup_context(self.app) as (f, _, _):
conns = [Mock(), Mock(), Mock()]
f.DatabaseError = KeyError
f.interface_errors = ()
f._db.connections = Mock()
f._db.connections.all.return_value = conns
f._close_database(force=True)
for c in conns:
c.close.assert_called_with()

--- Connection already closed and other DB exceptions are handled gracefully ---

@pytest.mark.parametrize('exc', [KeyError('closed'), Exception('generic')])
def test__close_database_handles_various_db_errors(self, exc):
with self.fixup_context(self.app) as (f, _, _):
conns = [Mock()]
conns[0].close.side_effect = exc
f.DatabaseError = type(exc)
f.interface_errors = ()
f._db.connections = Mock()
f._db.connections.all.return_value = conns
# Should not propagate
try:
f._close_database(force=True)
except Exception as raised:
# Only propagate if not 'closed' or 'not connected'
if 'closed' not in str(exc) and 'not connected' not in str(exc):
assert raised is exc
else:
pytest.fail("Should not raise for 'closed' or 'not connected'")

--- DB_REUSE_MAX: Ensure close is NOT called until threshold ---

def test_close_database_reuse_max_behavior(self):
with self.fixup_context(self.app) as (f, _, _):
f.db_reuse_max = 3
f._db_recycles = 2
with patch.object(f, '_close_database') as _close:
f.close_database()
_close.assert_not_called()
f._db_recycles = 3
f.close_database()
_close.assert_called_with()

--- Forbid use of CONN_MAX_AGE ---

def test_close_database_ignores_conn_max_age(self):
with self.fixup_context(self.app) as (f, _, _):
# Simulate a Django connection with CONN_MAX_AGE
f._db.connections.settings_dict = {'CONN_MAX_AGE': 999}
with patch.object(f, '_close_database') as _close:
f.close_database()
_close.assert_called_with() # Should not check or use CONN_MAX_AGE

--- Worker/process shutdown closes all connections ---

def test_on_worker_process_init_closes_connections(self):
with self.fixup_context(self.app) as (f, _, _):
conns = [Mock(connection=object()), Mock(connection=object())]
for c in conns:
c.connection = object()
f._db.connections = Mock()
f._db.connections.all.return_value = conns
with patch.object(f, '_maybe_close_db_fd') as maybe_close:
with patch.object(f, '_close_database') as close_db,
patch.object(f, 'close_cache') as close_cache:
f.on_worker_process_init()
assert maybe_close.call_count == len(conns)
close_db.assert_called_with(force=True)
close_cache.assert_called()

--- Concurrency: Multiple tasks finish simultaneously, closing only once ---

def test_close_database_concurrent_tasks(self):
with self.fixup_context(self.app) as (f, _, _):
with patch.object(f, '_close_database') as _close:
f.db_reuse_max = 2
f._db_recycles = 1
f.close_database()
f.close_database()
# Only one call until threshold
_close.assert_not_called()
f._db_recycles = 2
f.close_database()
_close.assert_called_with()

--- Dead code assertion: 'close_if_unusable_or_obsolete' is not used ---

def test_close_database_does_not_call_close_if_unusable_or_obsolete(self):
with self.fixup_context(self.app) as (f, _, _):
conns = [Mock()]
conns[0].close_if_unusable_or_obsolete = MagicMock()
f.DatabaseError = Exception
f.interface_errors = ()
f._db.connections = Mock()
f._db.connections.all.return_value = conns
f._close_database(force=True)
conns[0].close_if_unusable_or_obsolete.assert_not_called()
`

@danlamanna danlamanna force-pushed the always-close-django-connections branch from 88ca764 to e1fd8ff Compare August 8, 2025 18:50
@danlamanna
Copy link
Contributor Author

danlamanna commented Aug 8, 2025

@auvipy I added a test case for close_cache since that was the one area of uncovered code. AFAICT this PR increases coverage and reduces cyclomatic complexity.

This reverts commit f0c9b40.

This reverts PR celery#6134 and stops using the close_if_unusable_or_obsolete API since there are edge cases where it's
unable to detect if a connection if actually unusable. This is most obvious when Celery interrupts a query in
progress via a time limit handler. Django has marked this issue as
wontfix (https://code.djangoproject.com/ticket/30646). Since this is effectively an optimization for Celery that
can't be reliably used, Celery ought to close the connection after each task instead of trying to manage connections
in a way similar to how the Django application does.
This API can fail to close unusable connections in certain scenarios,
namely database failovers and ungraceful terminations (e.g. signal
handler for time limit exceeded tasks). This makes
close_if_unusable_or_obsolete adequate for HTTP request lifecycle
management but inappropriate for use within celery workers.

See also:
https://code.djangoproject.com/ticket/30646
https://forum.djangoproject.com/t/close-if-unusable-or-obsolete-fails-to-close-unusable-connections/41900
@auvipy auvipy force-pushed the always-close-django-connections branch from e1fd8ff to df224a7 Compare August 9, 2025 06:51
@auvipy auvipy requested a review from Copilot August 10, 2025 07:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR reverts a previous optimization that attempted to reuse Django database connections using close_if_unusable_or_obsolete(). The change returns to the safer approach of always closing database connections after each task to avoid edge cases where Django cannot reliably detect unusable connections.

  • Removes the force parameter from _close_database() method and always calls conn.close()
  • Updates test cases to reflect the simplified connection handling behavior
  • Adds new test to verify connections are always closed without attempting optimization

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
celery/fixups/django.py Removes conditional connection closing logic and always closes connections
t/unit/fixups/test_django.py Updates tests to match simplified behavior and adds verification test

Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merging this for now as it is a revert of an old feature which is not valid anymore. the pr add some extra tests to prevent possible regression.

@auvipy auvipy merged commit da4a80d into celery:main Aug 10, 2025
112 of 124 checks passed
@danlamanna danlamanna deleted the always-close-django-connections branch August 10, 2025 08:02
@auvipy
Copy link
Member

auvipy commented Aug 11, 2025

btw, do we also need to clean or update the relevant docs as well?

@danlamanna
Copy link
Contributor Author

AFAICT, there were never any docs added for this optimization, so I don't think there need to be any additional changes.

@auvipy
Copy link
Member

auvipy commented Aug 11, 2025

ok thanks

obaltian added a commit to obaltian/celery that referenced this pull request Oct 28, 2025
@orhanhenrik
Copy link

Hi, this is a big performance regression. Celery will now always close DB connections between every task, instead of respecting the django CONN_MAX_AGE setting.

We had to revert our upgrade to 5.6.0 because it was adding a huge load on our database, since creating connections is relatively expensive.

@auvipy
Copy link
Member

auvipy commented Dec 8, 2025

OK. I tried to revert this but seems that is not possible. a manual revert is needed. can you take on this?

@auvipy
Copy link
Member

auvipy commented Dec 8, 2025

https://forum.djangoproject.com/t/close-if-unusable-or-obsolete-fails-to-close-unusable-connections/41900).

after reading the reply of a django core team member, it seems we should not revert this at all. instead we could try different improvement.

@orhanhenrik
Copy link

orhanhenrik commented Dec 8, 2025

it seems we should not revert this at all.

As in, not revert the original code, or not revert this PR?

The way I understand the comment from the django forum, it sounds like this PR is not the correct solution. So it would make sense to revert this PR (revert #9824).

@auvipy
Copy link
Member

auvipy commented Dec 8, 2025

The way I understand the comment from the django forum, it sounds like this PR is not the correct solution. So it would make sense to revert this PR (revert #9824).

we got a manual revert request here. as per the forum discussion, this PR should not be merged in the first place. so we have to revert it now. thanks for your report

@danlamanna
Copy link
Contributor Author

danlamanna commented Dec 8, 2025

The way I understand the comment from the django forum, it sounds like this PR is not the correct solution. So it would make sense to revert this PR (revert #9824).

we got a manual revert request here. as per the forum discussion, this PR should not be merged in the first place. so we have to revert it now. thanks for your report

@auvipy
I'll respond to that forum post - I don't think his first comment is correct. His comment is also about correctness, not performance. This PR strictly makes celery do the more correct thing at the cost of not reusing connections - because that was causing problems for us (repro is in the forum post and is effectively how celery timeouts function).

@auvipy
Copy link
Member

auvipy commented Dec 8, 2025

we also got this PR #10015 an you please take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Django celery fixup doesn't respect Django settings for PostgreSQL connections

3 participants