-
-
Notifications
You must be signed in to change notification settings - Fork 5k
Flags broker_connection_retry_on_startup & broker_connection_retry aren’t reliable #8433
Description
Checklist
- I have verified that the issue exists against the
mainbranch of Celery. - This has already been asked to the discussions forum first.
- I have read the relevant section in the
contribution guide
on reporting bugs. - I have checked the issues list
for similar or identical bug reports. - I have checked the pull requests list
for existing proposed fixes. - I have checked the commit log
to find out if the bug was already fixed in the main branch. - I have included all related issues and possible duplicate issues
in this issue (If there are none, check this box anyway).
Mandatory Debugging Information
- I have included the output of
celery -A proj reportin the issue.
(if you are not able to do this, then at least specify the Celery
version affected). - I have verified that the issue exists against the
mainbranch of Celery. - I have included the contents of
pip freezein the issue. - I have included all the versions of all the external dependencies required
to reproduce this bug.
Optional Debugging Information
- I have tried reproducing the issue on more than one Python version
and/or implementation. - I have tried reproducing the issue on more than one message broker and/or
result backend. - I have tried reproducing the issue on more than one version of the message
broker and/or result backend. - I have tried reproducing the issue on more than one operating system.
- I have tried reproducing the issue on more than one workers pool.
- I have tried reproducing the issue with autoscaling, retries,
ETA/Countdown & rate limits disabled. - I have tried reproducing the issue after downgrading
and/or upgrading Celery and its dependencies.
Related Issues and Possible Duplicates
Related Issues
- None
Possible Duplicates
- None
Environment & Settings
Celery version: 5.3.x
Steps to Reproduce
This issue covers multiple cases that originate in a central bug.
The flag broker_connection_retry_on_startup is using
celery/celery/worker/consumer/consumer.py
Line 340 in 2cde29d
| is_connection_loss_on_startup = self.restart_count == 0 |
As part of the implementation of this flag in regard to broker_connection_retry, this flag also suffers bugs from this situation.
Required Dependencies
- Minimal Python Version: 3.8
- Minimal Celery Version: 5.3
- Minimal Kombu Version: 5.3
- Minimal Broker Version:
latestrabbitmq orlatestredis - Minimal Result Backend Version: N/A or Unknown
- Minimal OS and/or Kernel Version: N/A or Unknown
- Minimal Broker Client Version: N/A or Unknown
- Minimal Result Backend Client Version: N/A or Unknown
Minimally Reproducible Test Case 1
Details
- Turn off the broker container.
- set
broker_connection_retry_on_startupTrue. - set
broker_connection_retryFalse. - Run celery worker.
4.1 Wait for the connection retry 1/100… - Turn on the broker container.
5.1. Wait for connection to broker. - Turn off the broker container.
Expected Behavior
Worker should shut down.
Actual Behavior
Worker retries to connect.
Minimally Reproducible Test Case 2
Details
- Turn off the broker container.
- set
broker_connection_retry_on_startupFalse. - set
broker_connection_retryTrue. - Run celery worker.
Expected Behavior
Worker should shut down.
Actual Behavior
Worker retries to connect.
Minimally Reproducible Test Case 3
Details
- Turn on the broker container.
- set
broker_connection_retry_on_startupTrue. - set
broker_connection_retryFalse. - Run celery worker.
- Turn off the broker container.
Expected Behavior
Worker should shut down.
Actual Behavior
Worker retries to connect.
Minimally Reproducible Test Case 4
Details
- Turn on the broker container.
- set
broker_connection_retry_on_startupFalse. - set
broker_connection_retryTrue. - Run celery worker.
- Turn off the broker container.
Expected Behavior
Worker retries to connect.
Actual Behavior
Worker shuts down
Potential Fix
To fix all of these cases, a potential fix can be adding a flag to the consumer that will determine this exact startup condition, and then use it where broker_connection_retry_on_startup is used so on startup it will respect broker_connection_retry_on_startup and afterward broker_connection_retry, unless broker_connection_retry_on_startup is None, in which case broker_connection_retry will also determine the startup condition.