-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Hello
We experience the consumers stuck under very strange conditions. It’s still hard to catch . It always happens with KEY_SHARED subscriptions with multiple consumers.
What we figured out that the trigger for the problem is the adding additional consumers. It can be occurred as part of upgrade ( for example the new upgraded consumers being added and after this the old consumers being removed from the topic) .
Sometimes , because of network glitch, consumer is disconnecting and after few seconds reconnecting. We have 12 consumers on this particular topic but it’s enough that only one consumer will disconnect and reconnect in order to stuck the whole partition for every consumers while other partitions still works fine.
Again, the common trigger is some change with consumers, preferably the adding of new consumer/consumers
It doesn’t happen always. My estimation that it happens approximately in half of cases.
Of course, no errors on the brokers side expecting the standard messages about connecting new consumers.
The problem is that in order to recover from this situation we need to do one of the following :
Pulsar-admin topics unload topicname
Or we need completely disconnect all consumers and reconnect ( it isn’t recovering if even one consumers remaining connected.. we need to disconnect all of them at once)
Any ideas ?