Skip to content

Tasks completed during a cold shutdown incorrectly timeout with 5.5 #9505

@daveisfera

Description

@daveisfera

Checklist

  • I have verified that the issue exists against the main branch of Celery.
  • This has already been asked to the discussions forum first.
  • I have read the relevant section in the
    contribution guide
    on reporting bugs.
  • I have checked the issues list
    for similar or identical bug reports.
  • I have checked the pull requests list
    for existing proposed fixes.
  • I have checked the commit log
    to find out if the bug was already fixed in the main branch.
  • I have included all related issues and possible duplicate issues
    in this issue (If there are none, check this box anyway).
  • I have tried to reproduce the issue with pytest-celery and added the reproduction script below.

Mandatory Debugging Information

  • I have included the output of celery -A proj report in the issue.
  • I have verified that the issue exists against the main branch of Celery.
  • I have included the contents of pip freeze in the issue.
  • I have included all the versions of all the external dependencies required
    to reproduce this bug.

Optional Debugging Information

  • I have tried reproducing the issue on more than one Python version
    and/or implementation.
  • I have tried reproducing the issue on more than one message broker and/or
    result backend.
  • I have tried reproducing the issue on more than one version of the message
    broker and/or result backend.
  • I have tried reproducing the issue on more than one operating system.
  • I have tried reproducing the issue on more than one workers pool.
  • I have tried reproducing the issue with autoscaling, retries,
    ETA/Countdown & rate limits disabled.
  • I have tried reproducing the issue after downgrading
    and/or upgrading Celery and its dependencies.

Related Issues and Possible Duplicates

N/A

Environment & Settings

software -> celery:5.5.0rc4 (immunity) kombu:5.5.0rc2 py:3.9.20
            billiard:4.2.1 py-amqp:5.3.1   
platform -> system:Linux arch:64bit
            kernel version:6.10.14-linuxkit imp:CPython
loader   -> celery.loaders.app.AppLoader
settings -> transport:amqp results:django-db

Steps to Reproduce

Required Dependencies

  • Minimal Python Version: 3.9.20
  • Minimal Celery Version: 5.5.0rc4
  • Minimal Kombu Version: 5.5.0rc2
  • Minimal Broker Version: RabbitMQ 3.13.7
  • Minimal Result Backend Version: PG 17.0
  • Minimal OS and/or Kernel Version: Debian Bookworm
  • Minimal Broker Client Version: py-amqp 5.3.1
  • Minimal Result Backend Client Version: django-celery-results 2.5.`

Python Packages

celery==5.5.0rc4
django==3.2.25
django-celery-results==2.5.1

Other Dependencies

N/A

Minimally Reproducible Test Case

  1. Start celery with REMAP_SIGTERM=SIGQUIT
  2. Start a long running task
  3. Stop celery with kill -s TERM
  4. Observe the task completing but celery not exiting
  5. Verify that status is FAILURE even though it should be SUCCESS

Example in this repo:
https://github.com/daveisfera/celery_cold_shutdown

Logs:

[2025-01-21 02:29:20,211: WARNING/ForkPoolWorker-1] Value: 8

worker: Hitting Ctrl+C again will terminate all running tasks!
[2025-01-21 02:29:21,132: WARNING/MainProcess] Initiating Soft Shutdown, terminating in 16 seconds
[2025-01-21 02:29:28,220: WARNING/ForkPoolWorker-1] Done: 8
[2025-01-21 02:29:28,228: INFO/ForkPoolWorker-1] Task mysite.celery.long_task[97636c13-9c46-4880-8f31-76b277a37bf5] succeeded in 8.017255892998946s: None
[2025-01-21 02:29:37,167: ERROR/MainProcess] Task handler raised error: TimeLimitExceeded(15)
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/billiard/pool.py", line 684, in on_hard_timeout
    raise TimeLimitExceeded(job._timeout)
billiard.einfo.ExceptionWithTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/billiard/pool.py", line 684, in on_hard_timeout
    raise TimeLimitExceeded(job._timeout)
billiard.exceptions.TimeLimitExceeded: TimeLimitExceeded(15,)
"""
[2025-01-21 02:29:37,167: ERROR/MainProcess] Hard time limit (15s) exceeded for mysite.celery.long_task[97636c13-9c46-4880-8f31-76b277a37bf5]

worker: Cold shutdown (MainProcess)

NOTE: I'm not familiar enough with the testing setup to add this as a test, but it's pretty straightforward to reproduce and the example repo I linked to can show it

Expected Behavior

Tasks that complete when a cold shutdown is happening should be reported as SUCCESS

Actual Behavior

Tasks that complete when a cold shutdown is happening are timing out and reported as FAILURE

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions