worker stops consuming tasks after rabbitmq reconnection on celery 5 #9095
Replies: 9 comments 17 replies
-
|
Hi @bdoublet91 ,
Package versions: Can you provide some configurations so I can try to reproduce your situation? |
Beta Was this translation helpful? Give feedback.
-
|
Hi, Don't know if it can explains something ? For now I'm gonna split celery to have one celery per queue (split debug and perf) |
Beta Was this translation helpful? Give feedback.
-
|
Just adding a datapoint here: we're experiencing the same issue where the worker will stall after reconnect. We thought this was a rabbitmq issue at first with nodes going down, so we switched to classic mirrored queues (since quorum queues aren't supported by celery stable yet). Seems like we're still getting this issue intermittently. Our setup is Celery 5.2.7, kombu 5.3.0, rabbitmq 3.12. Worker is a single instance, prefork, with autoscale=0,5. Generally it'll go like this:
One thing I've been meaning to try next time this happens is to see if the connection is still there from the rabbitmq side--I'm guessing it's not or rabbitmq would complain about trying to deliver messages, but maybe they are getting delivered/prefetched and not acked? |
Beta Was this translation helpful? Give feedback.
-
|
Any updates on this? |
Beta Was this translation helpful? Give feedback.
-
|
We are also experiencing this issue every one or two weeks. Any workarounds? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
Actually, it's this stack trace that cause the celery restart and reconnect, received a task and never execute it. Hang from 2 days in production .... @auvipy do you have any idea of the issue with this stack trace ? :) Thanks |
Beta Was this translation helpful? Give feedback.
-
|
Any update please ? Could I give more debug information ? Also I confirm that depend on the running code on celery, we did some modification on our function and the consumer started to fail last week. But here the main problem is that celery reconnect to rabbitmq, received a task but never execute it Also rabbitmq management interface show 0 consumers until I restart the celery container |
Beta Was this translation helpful? Give feedback.
-
|
Same issue here. One task will be very long, longer than the consumer_timeout (and we're using acks_late) so we get a PRECONDITION FAILED message and broker disconnection. Also we have Keda autoscaling in our kubernetes cluster that pops more workers based on CPU usage / count of messages in the consumed queue. I've seen some people suggesting to use py-amqp instead of amqp in their broker url - it could me more stable ? Thank you |
Beta Was this translation helpful? Give feedback.




Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I started this discussion related to #7276 on redis but this issue is also in rabbitmq.
I am experiencing an issue with celery 5.3 after migrated from celery 4.4 to use the new setting broker lost channel retry
I am using rabbitmq (3.11) as the message broker. Sometimes the worker stops consuming tasks indefinitely after the worker restarts for whatever reasons. Once I force a restart of the worker, the worker starts to consumme task again.
Only if I run celery 5 worker without heartbeat/gossip/mingle this does not happen and I can restart rabbitmq without the worker stopping to consume tasks after it reconnects to it.
I am running the worker with the following options
Thanks
Beta Was this translation helpful? Give feedback.
All reactions