-
-
Notifications
You must be signed in to change notification settings - Fork 5k
[5.3.0] Task results order is not preserved anymore #8421
Description
Checklist
- I have verified that the issue exists against the
mainbranch of Celery. - This has already been asked to the discussions forum first.
- I have read the relevant section in the
contribution guide
on reporting bugs. - I have checked the issues list
for similar or identical bug reports. - I have checked the pull requests list
for existing proposed fixes. - I have checked the commit log
to find out if the bug was already fixed in the main branch. - I have included all related issues and possible duplicate issues
in this issue (If there are none, check this box anyway).
Mandatory Debugging Information
- I have included the output of
celery -A proj reportin the issue.
(if you are not able to do this, then at least specify the Celery
version affected). - I have verified that the issue exists against the
mainbranch of Celery. - I have included the contents of
pip freezein the issue. - I have included all the versions of all the external dependencies required
to reproduce this bug.
Optional Debugging Information
- I have tried reproducing the issue on more than one Python version
and/or implementation. - I have tried reproducing the issue on more than one message broker and/or
result backend. - I have tried reproducing the issue on more than one version of the message
broker and/or result backend. - I have tried reproducing the issue on more than one operating system.
- I have tried reproducing the issue on more than one workers pool.
- I have tried reproducing the issue with autoscaling, retries,
ETA/Countdown & rate limits disabled. - I have tried reproducing the issue after downgrading
and/or upgrading Celery and its dependencies.
Related Issues and Possible Duplicates
Related Issues
- Preserve order of group results with Redis result backend #6218 (original implementation of order preservation in Redis)
- Integration test fix #7460 (introduction of the regression, in 563269a#diff-3a80ff45da16a11b96e26a63973d7d490187a68ddc1949e2dfd7fd090b208841R1254)
Possible Duplicates
- None
Environment & Settings
Celery version: 5.3.0 (also reproducing on master, commit 2cde29d9fb6a8f8f805bec5d97b36bc930bcb52f)
celery report Output:
software -> celery:5.3.1 (emerald-rush) kombu:5.3.1 py:3.10.12
billiard:4.1.0 py-amqp:5.1.1
platform -> system:Linux arch:64bit, ELF
kernel version:6.1.30-0-virt imp:CPython
loader -> celery.loaders.app.AppLoader
settings -> transport:amqp results:redis://redis:6379/
Steps to Reproduce
Required Dependencies
- Minimal Python Version: 3.7.0
- Minimal Celery Version: 5.3.0 (or commit 9e324caaa6b175d8e51d3582378b78757e66a12d in
masterbrancb) - Minimal Kombu Version: tested with kombu 5.3.0
- Minimal Broker Version: tested with RabbitMQ 3.9.25
- Minimal Result Backend Version: tested with Redis 5.0.14
- Minimal OS and/or Kernel Version: tested on Docker aarch64
- Minimal Broker Client Version: tested with
amqp==5.1.1 - Minimal Result Backend Client Version: tested with
redis-cli==4.6.0
Python Packages
(ran pip freeze in the integration test Celery container)
pip freeze Output:
alabaster==0.7.13
amqp==5.1.1
async-timeout==4.0.2
attrs==23.1.0
azure-core==1.28.0
azure-storage-blob==12.17.0
Babel==2.12.1
backports.zoneinfo==0.2.1
billiard==4.1.0
boto3==1.28.18
botocore==1.31.18
bump2version==1.0.1
bumpversion==0.6.0
cachetools==5.3.1
cassandra-driver==3.28.0
celery==5.3.1
certifi==2023.7.22
cffi==1.15.1
cfgv==3.3.1
chardet==5.2.0
charset-normalizer==3.2.0
click==8.1.6
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.3.0
colorama==0.4.6
couchbase==4.1.6
coverage==7.2.7
cryptography==41.0.2
DateTime==5.2
distlib==0.3.7
dnspython==2.4.1
docutils==0.19
elasticsearch==7.17.9
ephem==4.1.4
eventlet==0.33.3
exceptiongroup==1.1.2
filelock==3.12.2
flake8==6.1.0
flake8-docstrings==1.7.0
flakeplus==1.1.0
future==0.18.3
geomet==0.2.1.post1
gevent==23.7.0
greenlet==2.0.2
identify==2.5.26
idna==3.4
imagesize==1.4.1
importlib-metadata==6.8.0
iniconfig==2.0.0
isodate==0.6.1
isort==5.12.0
Jinja2==3.1.2
jmespath==1.0.1
kombu==5.3.1
livereload==2.6.3
MarkupSafe==2.1.3
mccabe==0.7.0
mock==5.1.0
moto==4.1.14
msgpack==1.0.5
mypy==1.4.1
mypy-extensions==1.0.0
nodeenv==1.8.0
packaging==23.1
platformdirs==3.10.0
pluggy==1.2.0
pre-commit==3.3.3
prompt-toolkit==3.0.39
pyArango==2.0.2
pycodestyle==2.11.0
pycouchdb==1.14.2
pycparser==2.21
pycurl==7.45.2
pydocstyle==6.3.0
pydocumentdb==2.3.5
pyflakes==3.1.0
Pygments==2.15.1
pylibmc==1.6.3
pymongo==4.4.1
pyproject-api==1.5.3
pytest==7.4.0
pytest-celery==0.0.0
pytest-click==1.1.0
pytest-cov==4.1.0
pytest-github-actions-annotate-failures==0.2.0
pytest-order==1.1.0
pytest-rerunfailures==12.0
pytest-subtests==0.11.0
pytest-timeout==2.1.0
python-consul2==0.1.5
python-dateutil==2.8.2
python-memcached==1.59
pytz==2023.3
PyYAML==6.0.1
redis==4.6.0
requests==2.31.0
responses==0.23.3
s3transfer==0.6.1
six==1.16.0
snowballstemmer==2.2.0
softlayer-messaging==1.0.3
Sphinx==5.3.0
sphinx-autobuild==2021.3.14
sphinx-celery==2.0.0
sphinx-click==4.4.0
sphinx-testing==1.0.1
sphinx2rst==1.1.0
sphinxcontrib-applehelp==1.0.4
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.1
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
SQLAlchemy==2.0.19
tblib==2.0.0
tomli==2.0.1
tornado==6.3.2
tox==4.6.4
types-PyYAML==6.0.12.11
typing_extensions==4.7.1
tzdata==2023.3
Unipath==1.1
urllib3==1.26.16
vine==5.0.0
virtualenv==20.24.2
wcwidth==0.2.6
Werkzeug==2.3.6
xmltodict==0.13.0
zipp==3.16.2
zope.event==5.0
zope.interface==6.0
Other Dependencies
Details
N/A
Minimally Reproducible Test Case
- [backport 5.3.1] fix(canvas): add group index when unrolling tasks backmarket-oss/celery#1 adds an integration test that validate that the results are in the same order as the tasks when using
chord, ending up failing (based onmasterbranch) - https://github.com/backmarket-oss/celery/pull/2 includes the same test, but forces the consumption of
resultsgenerator, effectively settinggroup_indexand incidentally making the integration test pass (although it breaks another test)
Expected Behavior
Results are received in the callback in the same order as the created tasks when using chord.
So if we have tasks [1, 2, 3] finishing in order [3, 1, 2], we receive [1, 2, 3] in the callback.
Actual Behavior
Results are received in the callback in the order the tasks finish.
So if we have tasks [1, 2, 3] finishing in order [3, 1, 2], we receive [3, 1, 2] in the callback.
Investigation
As shown in https://github.com/backmarket-oss/celery/pull/2, it is possible for the callback to receive tasks in the correct order by forcing results generator to be executed.
While this code's purpose is only to demonstrate the issue, digging a bit more shows that the generator does not seem to be consumed when it should, leading up to not setting up group_index, which is used by the backend to order the results based on the order the tasks where created.
Checking the history, it looks like the regression was introduced in #7460, and more exactly here.
While the zip itself was unnecessary, it actually forced the generator to be executed, so even if it looked innocent to remove this code, it does mean that we do not set the group_index by executing the generator anymore.
Note that I would be happy to provide a pull request to properly fix the regression, but I'm not sure about the best way to fix the issue.
I believe that we do not want to force the execution of the generator in this part of the code (which could explain why the commit breaks another test), but if you think this is, or know where would be the best place to fix this, I'd be happy to provide a pull request.