Skip to content

Flags broker_connection_retry_on_startup & broker_connection_retry aren’t reliable  #8433

@Nusnus

Description

@Nusnus

Checklist

  • I have verified that the issue exists against the main branch of Celery.
  • This has already been asked to the discussions forum first.
  • I have read the relevant section in the
    contribution guide
    on reporting bugs.
  • I have checked the issues list
    for similar or identical bug reports.
  • I have checked the pull requests list
    for existing proposed fixes.
  • I have checked the commit log
    to find out if the bug was already fixed in the main branch.
  • I have included all related issues and possible duplicate issues
    in this issue (If there are none, check this box anyway).

Mandatory Debugging Information

  • I have included the output of celery -A proj report in the issue.
    (if you are not able to do this, then at least specify the Celery
    version affected).
  • I have verified that the issue exists against the main branch of Celery.
  • I have included the contents of pip freeze in the issue.
  • I have included all the versions of all the external dependencies required
    to reproduce this bug.

Optional Debugging Information

  • I have tried reproducing the issue on more than one Python version
    and/or implementation.
  • I have tried reproducing the issue on more than one message broker and/or
    result backend.
  • I have tried reproducing the issue on more than one version of the message
    broker and/or result backend.
  • I have tried reproducing the issue on more than one operating system.
  • I have tried reproducing the issue on more than one workers pool.
  • I have tried reproducing the issue with autoscaling, retries,
    ETA/Countdown & rate limits disabled.
  • I have tried reproducing the issue after downgrading
    and/or upgrading Celery and its dependencies.

Related Issues and Possible Duplicates

Related Issues

  • None

Possible Duplicates

  • None

Environment & Settings

Celery version: 5.3.x

Steps to Reproduce

This issue covers multiple cases that originate in a central bug.
The flag broker_connection_retry_on_startup is using

is_connection_loss_on_startup = self.restart_count == 0
to determine if the connection is happening at startup. This is incorrect as it does not reliably validate this condition, which causes multiple use-cases to fail.

As part of the implementation of this flag in regard to broker_connection_retry, this flag also suffers bugs from this situation.

Required Dependencies

  • Minimal Python Version: 3.8
  • Minimal Celery Version: 5.3
  • Minimal Kombu Version: 5.3
  • Minimal Broker Version: latest rabbitmq or latest redis
  • Minimal Result Backend Version: N/A or Unknown
  • Minimal OS and/or Kernel Version: N/A or Unknown
  • Minimal Broker Client Version: N/A or Unknown
  • Minimal Result Backend Client Version: N/A or Unknown

Minimally Reproducible Test Case 1

Details

  1. Turn off the broker container.
  2. set broker_connection_retry_on_startup True.
  3. set broker_connection_retry False.
  4. Run celery worker.
    4.1 Wait for the connection retry 1/100…
  5. Turn on the broker container.
    5.1. Wait for connection to broker.
  6. Turn off the broker container.

Expected Behavior

Worker should shut down.

Actual Behavior

Worker retries to connect.

Minimally Reproducible Test Case 2

Details

  1. Turn off the broker container.
  2. set broker_connection_retry_on_startup False.
  3. set broker_connection_retry True.
  4. Run celery worker.

Expected Behavior

Worker should shut down.

Actual Behavior

Worker retries to connect.

Minimally Reproducible Test Case 3

Details

  1. Turn on the broker container.
  2. set broker_connection_retry_on_startup True.
  3. set broker_connection_retry False.
  4. Run celery worker.
  5. Turn off the broker container.

Expected Behavior

Worker should shut down.

Actual Behavior

Worker retries to connect.

Minimally Reproducible Test Case 4

Details

  1. Turn on the broker container.
  2. set broker_connection_retry_on_startup False.
  3. set broker_connection_retry True.
  4. Run celery worker.
  5. Turn off the broker container.

Expected Behavior

Worker retries to connect.

Actual Behavior

Worker shuts down

Potential Fix

To fix all of these cases, a potential fix can be adding a flag to the consumer that will determine this exact startup condition, and then use it where broker_connection_retry_on_startup is used so on startup it will respect broker_connection_retry_on_startup and afterward broker_connection_retry, unless broker_connection_retry_on_startup is None, in which case broker_connection_retry will also determine the startup condition.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions