-
-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
Checklist
- I have verified that the issue exists against the
masterbranch of Celery. - This has already been asked to the discussions forum first.
- I have read the relevant section in the
contribution guide
on reporting bugs. - I have checked the issues list
for similar or identical bug reports. - I have checked the pull requests list
for existing proposed fixes. - I have checked the commit log
to find out if the bug was already fixed in the master branch. - I have included all related issues and possible duplicate issues
in this issue (If there are none, check this box anyway).
Mandatory Debugging Information
- I have included the output of
celery -A proj reportin the issue.
(if you are not able to do this, then at least specify the Celery
version affected). - I have verified that the issue exists against the
masterbranch of Celery. - I have included the contents of
pip freezein the issue. - I have included all the versions of all the external dependencies required
to reproduce this bug.
Optional Debugging Information
- I have tried reproducing the issue on more than one Python version
and/or implementation. - I have tried reproducing the issue on more than one message broker and/or
result backend. - I have tried reproducing the issue on more than one version of the message
broker and/or result backend. - I have tried reproducing the issue on more than one operating system.
- I have tried reproducing the issue on more than one workers pool.
- I have tried reproducing the issue with autoscaling, retries,
ETA/Countdown & rate limits disabled. - I have tried reproducing the issue after downgrading
and/or upgrading Celery and its dependencies.
Related Issues and Possible Duplicates
Related Issues
- Tasks intermittently get stuck as reserved even with -Ofair option enabled (Celery 3.1.23) #3765 describes a symptom which could be explained by this bug
- Celery with concurrency does not use all workers while multiple tasks reserved #7277 describes a similar symptom, although I am less convinced it is related given less details on that report
There are numerous bug reports in the Airflow project which I believe may be caused by this bug. apache/airflow#19699 is the most comprehensive of these.
Possible Duplicates
- Tasks intermittently get stuck as reserved even with -Ofair option enabled (Celery 3.1.23) #3765
- Celery with concurrency does not use all workers while multiple tasks reserved #7277
Environment & Settings
Celery version:
[master] ~/external/celery-broken-connection-busy-worker-bug: celery --version
5.2.6 (dawn-chorus)
celery report Output:
[master] ~/external/celery-broken-connection-busy-worker-bug: celery -A testcase report
software -> celery:5.2.6 (dawn-chorus) kombu:5.2.4 py:3.9.12
billiard:3.6.4.0 py-amqp:5.1.1
platform -> system:Darwin arch:64bit
kernel version:20.6.0 imp:CPython
loader -> celery.loaders.app.AppLoader
settings -> transport:amqp results:rpc:///
broker_url: 'amqp://test:********@localhost:5672/test_vhost'
result_backend: 'rpc:///'
include: ['testcase.tasks']
deprecated_settings: None
Steps to Reproduce
Required Dependencies
- Minimal Python Version: N/A or Unknown
- Minimal Celery Version: 3.1.0 (I believe the issue began with 123f002)
- Minimal Kombu Version: N/A or Unknown
- Minimal Broker Version: N/A or Unknown
- Minimal Result Backend Version: N/A or Unknown
- Minimal OS and/or Kernel Version: N/A or Unknown
- Minimal Broker Client Version: N/A or Unknown
- Minimal Result Backend Client Version: N/A or Unknown
Python Packages
pip freeze Output:
[master] ~/external/celery-broken-connection-busy-worker-bug: pip freeze
amqp==5.1.1
ansicolors==1.1.8
appnope==0.1.2
argcomplete==2.0.0
asana==0.10.1
astroid==2.9.3
awacs==2.0.1
beartype==0.9.1
beautifulsoup4==4.8.2
billiard==3.6.4.0
black==22.1.0
boto3==1.18.52
botocore==1.21.65
cached-property==1.5.2
cachetools==4.2.4
celery==5.2.6
certifi==2021.10.8
cfn-flip==1.3.0
charset-normalizer==2.0.12
chkcrontab==1.7
click==8.0.4
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.2.0
colorama==0.3.9
decorator==5.1.1
Deprecated==1.2.13
distlib==0.3.1
dnspython==1.15.0
ec2-metadata==2.2.0
fluent-logger==0.10.0
future==0.17.1
google-api-core==1.31.5
google-auth==1.35.0
google-cloud-bigquery==0.31.0
google-cloud-core==0.28.1
google-crc32c==1.3.0
google-resumable-media==2.3.2
googleapis-common-protos==1.56.0
idna==3.3
ipython==5.8.0
isort==5.10.1
Jinja2==3.0.1
jmespath==0.10.0
json5==0.6.1
kazoo==2.8.0
kombu==5.2.4
lazy-object-proxy==1.7.1
MarkupSafe==2.0.1
mccabe==0.6.1
msgpack==1.0.3
mypy==0.910
mypy-extensions==0.4.3
mysqlclient==1.4.4
oauthlib==3.2.0
packaging==21.3
pathspec==0.9.0
pexpect==4.8.0
pickleshare==0.7.5
pip-tools==4.1.0
platformdirs==2.5.1
prompt-toolkit==1.0.18
protobuf==3.19.4
psutil==5.7.2
ptyprocess==0.7.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
PyGithub==1.47
Pygments==2.11.2
pyhocon==0.3.57
pyjavaproperties3==0.6
PyJWT==2.3.0
pylint==2.12.1
PyMySQL==1.0.2
pyobjc-core==7.1
pyobjc-framework-Cocoa==7.1
pyobjc-framework-FSEvents==7.1
pyparsing==3.0.7
python-dateutil==2.6.0
pytz==2022.1
PyYAML==6.0
requests==2.27.1
requests-oauthlib==1.3.1
rsa==4.8
s3transfer==0.5.2
simplegeneric==0.8.1
simplejson==3.17.0
six==1.15.0
soupsieve==2.3.1
sqlparse==0.3.0
tabulate==0.8.7
toml==0.10.2
tomli==2.0.1
tornado==4.5.3
traitlets==5.1.1
troposphere==3.2.2
typeguard==2.13.3
typing_extensions==4.0.1
urllib3==1.26.9
vine==5.0.0
watchdog==2.0.1
wcwidth==0.2.5
wrapt==1.13.3
yapf==0.21.0
zake==0.2.2
Other Dependencies
Details
N/A
Minimally Reproducible Test Case
See https://github.com/dima-asana/celery-broken-connection-busy-worker-bug for code to reproduce.
Set up 2 tasks: one that sleeps 10 min and one that sleeps 5 sec.
Run celery with default settings (prefork pool, concurrency 16)
Submit the sleep 10 min task
Restart the connection to the broker (tested on rabbitmq but I think broker independent)
Submit the sleep 5 sec task
Expected Behavior
Celery processes the sleep 10 min task and the sleep 5 sec task in parallel
Actual Behavior
Celery processes the sleep 10 min task and the sleep_5_sec task in serial. Logs at https://gist.github.com/dima-asana/9f96a8fa55400c8bf5627aa6bf96fb1a