Skip to content

concurrency utilization breaks on broker connection reset #7515

@dima-asana

Description

@dima-asana

Checklist

  • I have verified that the issue exists against the master branch of Celery.
  • This has already been asked to the discussions forum first.
  • I have read the relevant section in the
    contribution guide
    on reporting bugs.
  • I have checked the issues list
    for similar or identical bug reports.
  • I have checked the pull requests list
    for existing proposed fixes.
  • I have checked the commit log
    to find out if the bug was already fixed in the master branch.
  • I have included all related issues and possible duplicate issues
    in this issue (If there are none, check this box anyway).

Mandatory Debugging Information

  • I have included the output of celery -A proj report in the issue.
    (if you are not able to do this, then at least specify the Celery
    version affected).
  • I have verified that the issue exists against the master branch of Celery.
  • I have included the contents of pip freeze in the issue.
  • I have included all the versions of all the external dependencies required
    to reproduce this bug.

Optional Debugging Information

  • I have tried reproducing the issue on more than one Python version
    and/or implementation.
  • I have tried reproducing the issue on more than one message broker and/or
    result backend.
  • I have tried reproducing the issue on more than one version of the message
    broker and/or result backend.
  • I have tried reproducing the issue on more than one operating system.
  • I have tried reproducing the issue on more than one workers pool.
  • I have tried reproducing the issue with autoscaling, retries,
    ETA/Countdown & rate limits disabled.
  • I have tried reproducing the issue after downgrading
    and/or upgrading Celery and its dependencies.

Related Issues and Possible Duplicates

Related Issues

There are numerous bug reports in the Airflow project which I believe may be caused by this bug. apache/airflow#19699 is the most comprehensive of these.

Possible Duplicates

Environment & Settings

Celery version:
[master] ~/external/celery-broken-connection-busy-worker-bug: celery --version
5.2.6 (dawn-chorus)

celery report Output:

[master] ~/external/celery-broken-connection-busy-worker-bug: celery -A testcase report

software -> celery:5.2.6 (dawn-chorus) kombu:5.2.4 py:3.9.12
            billiard:3.6.4.0 py-amqp:5.1.1
platform -> system:Darwin arch:64bit
            kernel version:20.6.0 imp:CPython
loader   -> celery.loaders.app.AppLoader
settings -> transport:amqp results:rpc:///

broker_url: 'amqp://test:********@localhost:5672/test_vhost'
result_backend: 'rpc:///'
include: ['testcase.tasks']
deprecated_settings: None


Steps to Reproduce

Required Dependencies

  • Minimal Python Version: N/A or Unknown
  • Minimal Celery Version: 3.1.0 (I believe the issue began with 123f002)
  • Minimal Kombu Version: N/A or Unknown
  • Minimal Broker Version: N/A or Unknown
  • Minimal Result Backend Version: N/A or Unknown
  • Minimal OS and/or Kernel Version: N/A or Unknown
  • Minimal Broker Client Version: N/A or Unknown
  • Minimal Result Backend Client Version: N/A or Unknown

Python Packages

pip freeze Output:

[master] ~/external/celery-broken-connection-busy-worker-bug: pip freeze
amqp==5.1.1
ansicolors==1.1.8
appnope==0.1.2
argcomplete==2.0.0
asana==0.10.1
astroid==2.9.3
awacs==2.0.1
beartype==0.9.1
beautifulsoup4==4.8.2
billiard==3.6.4.0
black==22.1.0
boto3==1.18.52
botocore==1.21.65
cached-property==1.5.2
cachetools==4.2.4
celery==5.2.6
certifi==2021.10.8
cfn-flip==1.3.0
charset-normalizer==2.0.12
chkcrontab==1.7
click==8.0.4
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.2.0
colorama==0.3.9
decorator==5.1.1
Deprecated==1.2.13
distlib==0.3.1
dnspython==1.15.0
ec2-metadata==2.2.0
fluent-logger==0.10.0
future==0.17.1
google-api-core==1.31.5
google-auth==1.35.0
google-cloud-bigquery==0.31.0
google-cloud-core==0.28.1
google-crc32c==1.3.0
google-resumable-media==2.3.2
googleapis-common-protos==1.56.0
idna==3.3
ipython==5.8.0
isort==5.10.1
Jinja2==3.0.1
jmespath==0.10.0
json5==0.6.1
kazoo==2.8.0
kombu==5.2.4
lazy-object-proxy==1.7.1
MarkupSafe==2.0.1
mccabe==0.6.1
msgpack==1.0.3
mypy==0.910
mypy-extensions==0.4.3
mysqlclient==1.4.4
oauthlib==3.2.0
packaging==21.3
pathspec==0.9.0
pexpect==4.8.0
pickleshare==0.7.5
pip-tools==4.1.0
platformdirs==2.5.1
prompt-toolkit==1.0.18
protobuf==3.19.4
psutil==5.7.2
ptyprocess==0.7.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
PyGithub==1.47
Pygments==2.11.2
pyhocon==0.3.57
pyjavaproperties3==0.6
PyJWT==2.3.0
pylint==2.12.1
PyMySQL==1.0.2
pyobjc-core==7.1
pyobjc-framework-Cocoa==7.1
pyobjc-framework-FSEvents==7.1
pyparsing==3.0.7
python-dateutil==2.6.0
pytz==2022.1
PyYAML==6.0
requests==2.27.1
requests-oauthlib==1.3.1
rsa==4.8
s3transfer==0.5.2
simplegeneric==0.8.1
simplejson==3.17.0
six==1.15.0
soupsieve==2.3.1
sqlparse==0.3.0
tabulate==0.8.7
toml==0.10.2
tomli==2.0.1
tornado==4.5.3
traitlets==5.1.1
troposphere==3.2.2
typeguard==2.13.3
typing_extensions==4.0.1
urllib3==1.26.9
vine==5.0.0
watchdog==2.0.1
wcwidth==0.2.5
wrapt==1.13.3
yapf==0.21.0
zake==0.2.2

Other Dependencies

Details

N/A

Minimally Reproducible Test Case

See https://github.com/dima-asana/celery-broken-connection-busy-worker-bug for code to reproduce.

Set up 2 tasks: one that sleeps 10 min and one that sleeps 5 sec.
Run celery with default settings (prefork pool, concurrency 16)
Submit the sleep 10 min task
Restart the connection to the broker (tested on rabbitmq but I think broker independent)
Submit the sleep 5 sec task

Expected Behavior

Celery processes the sleep 10 min task and the sleep 5 sec task in parallel

Actual Behavior

Celery processes the sleep 10 min task and the sleep_5_sec task in serial. Logs at https://gist.github.com/dima-asana/9f96a8fa55400c8bf5627aa6bf96fb1a

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions