Skip to content

Tasks silently fail with result None if Redis result backend goes away #8588

@riccardomassullo

Description

@riccardomassullo

Checklist

  • I have verified that the issue exists against the main branch of Celery.
  • This has already been asked to the discussions forum first.
  • I have read the relevant section in the
    contribution guide
    on reporting bugs.
  • I have checked the issues list
    for similar or identical bug reports.
  • I have checked the pull requests list
    for existing proposed fixes.
  • I have checked the commit log
    to find out if the bug was already fixed in the main branch.
  • I have included all related issues and possible duplicate issues
    in this issue (If there are none, check this box anyway).

Mandatory Debugging Information

  • I have included the output of celery -A proj report in the issue.
    (if you are not able to do this, then at least specify the Celery
    version affected).
  • I have verified that the issue exists against the main branch of Celery.
  • I have included the contents of pip freeze in the issue.
  • I have included all the versions of all the external dependencies required
    to reproduce this bug.

Optional Debugging Information

  • I have tried reproducing the issue on more than one Python version
    and/or implementation.
  • I have tried reproducing the issue on more than one message broker and/or
    result backend.
  • I have tried reproducing the issue on more than one version of the message
    broker and/or result backend.
  • I have tried reproducing the issue on more than one operating system.
  • I have tried reproducing the issue on more than one workers pool.
  • I have tried reproducing the issue with autoscaling, retries,
    ETA/Countdown & rate limits disabled.
  • I have tried reproducing the issue after downgrading
    and/or upgrading Celery and its dependencies.

Related Issues and Possible Duplicates

Related Issues

  • None

Possible Duplicates

  • None

Environment & Settings

Celery version: 5.2.2

Report

software -> celery:5.2.2 (dawn-chorus) kombu:5.2.4 py:3.8.10
billiard:3.6.4.0 py-amqp:5.1.1
platform -> system:Linux arch:64bit, ELF
kernel version:5.15.0-1044-aws imp:CPython
loader -> celery.loaders.app.AppLoader
settings -> transport:amqp results:redis://***.***.cache.amazonaws.com:6379/

broker_url: 'amqp://:@..compute.internal:5672//'
result_backend: 'redis://..amazonaws.com:6379/'
deprecated_settings: None
task_compression: 'gzip'
result_compression: 'gzip'
broker_transport_options: {
'visibility_timeout': 300}
result_backend_transport_options: {
}
task_track_started: True
task_acks_late: True
task_reject_on_worker_lost: True
worker_prefetch_multiplier: 4
task_queues:
(<unbound Queue ***_security -> <unbound Exchange ***_security(direct)> -> ***_security>,)

Closing the transport connection timed out.

Steps to Reproduce

  • Execute a long running task
  • Kill/stop/whatever Redis result backend instance while long task is running
  • Celery fails to save the result in the Redis result backend
  • An error is throwed:
[2023-10-23 13:16:28,971: INFO/MainProcess] Task sherpa.celery_worker.main.handle[4ca96ebf-e76b-46fc-8b30-030f715b1b8b] received
[2023-10-23 13:16:28,971: ERROR/MainProcess] Pool callback raised exception: ConnectionError('Error 111 connecting to 172.24.0.1:6380. Connection refused.')
Traceback (most recent call last):
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/connection.py", line 707, in connect
    sock = self.retry.call_with_retry(
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/retry.py", line 46, in call_with_retry
    return do()
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/connection.py", line 708, in <lambda>
    lambda: self._connect(), lambda error: self.disconnect(error)
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/connection.py", line 1006, in _connect
    raise err
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/connection.py", line 994, in _connect
    sock.connect(socket_address)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/billiard/pool.py", line 1796, in safe_apply_callback
    fun(*args, **kwargs)
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/celery/worker/request.py", line 577, in on_failure
    self.task.backend.mark_as_failure(
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/celery/backends/base.py", line 172, in mark_as_failure
    self.store_result(task_id, exc, state,
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/celery/backends/base.py", line 528, in store_result
    self._store_result(task_id, result, state, traceback,
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/celery/backends/base.py", line 956, in _store_result
    current_meta = self._get_task_meta_for(task_id)
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/celery/backends/base.py", line 978, in _get_task_meta_for
    meta = self.get(self.get_key_for_task(task_id))
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/celery/backends/redis.py", line 368, in get
    return self.client.get(key)
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/commands/core.py", line 1816, in get
    return self.execute_command("GET", name)
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/elasticapm/instrumentation/packages/base.py", line 208, in call_if_sampling
    return wrapped(*args, **kwargs)
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/client.py", line 1266, in execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/connection.py", line 1461, in get_connection
    connection.connect()
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/connection.py", line 713, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to 172.24.0.1:6380. Connection refused.
[2023-10-23 13:16:28,972: INFO/MainProcess] Task sherpa.celery_worker.main.handle[5068b4bd-be27-49e0-bd55-d556e4826a2c] received
  • All queued jobs are executed, completed but Celery fails to save the result to the Redis result store
  • Jobs are not requeued and insted they are all instantly processed with result None

Required Dependencies

  • Minimal Python Version: 3.8.10
  • Minimal Celery Version: 5.2.2
  • Minimal Kombu Version: 5.2.4
  • Minimal Broker Version: RabbitMQ 3.9.11
  • Minimal Result Backend Version: Redis 6.x
  • Minimal OS and/or Kernel Version: Docker Ubuntu 20.04 LTS, Kernel 5.15.0
  • Minimal Broker Client Version: amqp 5.1.1
  • Minimal Result Backend Client Version: redis 4.6.0

Python Packages

pip freeze Output:

amqp==5.1.1
anyio==3.7.1
asgiref==3.7.2
async-timeout==4.0.3
attrs==23.1.0
billiard==3.6.4.0
build==0.10.0
CacheControl==0.12.14
celery==5.2.2
certifi==2023.7.22
cffi==1.16.0
charset-normalizer==3.3.0
cleo==2.0.1
click==8.1.7
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.3.0
crashtest==0.4.1
cryptography==41.0.4
distlib==0.3.7
dulwich==0.21.6
ecs-logging==2.1.0
elastic-apm==6.18.0
elasticsearch==7.17.1
exceptiongroup==1.1.3
fastapi==0.78.0
fastapi-versioning==0.10.0
filelock==3.12.4
flower==2.0.0
h11==0.14.0
html5lib==1.1
humanize==4.6.0
idna==3.4
importlib-metadata==6.8.0
importlib-resources==6.1.0
installer==0.7.0
jaraco.classes==3.3.0
jeepney==0.8.0
jsonschema==4.19.1
jsonschema-specifications==2023.7.1
keyring==23.13.1
kombu==5.2.4
lockfile==0.12.2
more-itertools==10.1.0
msgpack==1.0.7
packaging==23.2
pexpect==4.8.0
pkginfo==1.9.6
pkgutil-resolve-name==1.3.10
platformdirs==2.6.2
poetry==1.4.2
poetry-core==1.5.2
poetry-plugin-export==1.5.0
prometheus-client==0.17.1
prompt-toolkit==3.0.39
ptyprocess==0.7.0
pycparser==2.21
pycurl==7.45.2
pydantic==1.10.13
pyproject-hooks==1.0.0
pytz==2023.3.post1
rapidfuzz==2.15.1
redis==4.6.0
referencing==0.30.2
requests==2.31.0
requests-toolbelt==0.10.1
rpds-py==0.10.3
SecretStorage==3.3.3
shellingham==1.5.3
six==1.16.0
sniffio==1.3.0
starlette==0.19.1
tomli==2.0.1
tomlkit==0.12.1
tornado==6.2
trove-classifiers==2023.9.19
typing-extensions==4.7.1
urllib3==1.26.17
uvicorn==0.17.6
vine==5.0.0
virtualenv==20.24.5
wcwidth==0.2.8
webencodings==0.5.1
wrapt==1.15.0
zipp==3.17.0

Other Dependencies

Details

N/A

Minimally Reproducible Test Case

Details

Expected Behavior

Jobs fails of result backend is not present anymore and if necessary they are requeued/retried or sent to a DLQ

Actual Behavior

  • Celery fails to save the result in the Redis result backend
  • An error is throwed:
[2023-10-23 13:16:28,971: INFO/MainProcess] Task sherpa.celery_worker.main.handle[4ca96ebf-e76b-46fc-8b30-030f715b1b8b] received
[2023-10-23 13:16:28,971: ERROR/MainProcess] Pool callback raised exception: ConnectionError('Error 111 connecting to 172.24.0.1:6380. Connection refused.')
Traceback (most recent call last):
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/connection.py", line 707, in connect
    sock = self.retry.call_with_retry(
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/retry.py", line 46, in call_with_retry
    return do()
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/connection.py", line 708, in <lambda>
    lambda: self._connect(), lambda error: self.disconnect(error)
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/connection.py", line 1006, in _connect
    raise err
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/connection.py", line 994, in _connect
    sock.connect(socket_address)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/billiard/pool.py", line 1796, in safe_apply_callback
    fun(*args, **kwargs)
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/celery/worker/request.py", line 577, in on_failure
    self.task.backend.mark_as_failure(
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/celery/backends/base.py", line 172, in mark_as_failure
    self.store_result(task_id, exc, state,
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/celery/backends/base.py", line 528, in store_result
    self._store_result(task_id, result, state, traceback,
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/celery/backends/base.py", line 956, in _store_result
    current_meta = self._get_task_meta_for(task_id)
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/celery/backends/base.py", line 978, in _get_task_meta_for
    meta = self.get(self.get_key_for_task(task_id))
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/celery/backends/redis.py", line 368, in get
    return self.client.get(key)
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/commands/core.py", line 1816, in get
    return self.execute_command("GET", name)
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/elasticapm/instrumentation/packages/base.py", line 208, in call_if_sampling
    return wrapped(*args, **kwargs)
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/client.py", line 1266, in execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/connection.py", line 1461, in get_connection
    connection.connect()
  File "/api/.local/pipx/venvs/poetry/lib/python3.8/site-packages/redis/connection.py", line 713, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to 172.24.0.1:6380. Connection refused.
[2023-10-23 13:16:28,972: INFO/MainProcess] Task sherpa.celery_worker.main.handle[5068b4bd-be27-49e0-bd55-d556e4826a2c] received
  • All queued jobs are executed, completed but Celery fails to save the result to the Redis result store
  • Jobs are not requeued and insted they are all instantly processed with result None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions