-
Describe the bugHi, We are running to an issue with expiring quorum queues rejecting messages. We believe it is because bindings are deleted later than the queue is stopped - which results in the channel rejecting messages. We have a use case with some small “fanout” where messages may go into multiple queues. In our case, this causes confusion on the client side - triggering a republish of messages. This even happens when the traffic is 2-3 messages / second, it is not related to load. The main reason it happens is because when rabbitmq-server/deps/rabbit/src/rabbit_fifo_client.erl Lines 137 to 145 in b9610cd The reason for the new enqueuer is that the As soon as the queue is deleted from the DB, RabbitMQ starts to confirm the messages again. For some reason, it does not affect classic queues, they confirm all messages in the same scenario. https://github.com/rabbitmq/rabbitmq-server/blob/v4.2.x/deps/rabbit/src/rabbit_channel.erl#L2815 Reproduction stepsEasiest way to reproduce:
Example code: https://gist.github.com/luos/00f41bea6a7abd97d80e21f0b1abef5b Expected behaviorNo messages are rejected, the same way when the channel receives Additional contextAs an experiment, I tested two solutions to this. FirstDelete the queue from the db before stopping the SecondIntroduce a new queue state, deleting, which will reject new consumers from the queue. I included both changes in this code, let me know what you think, just to give you an idea what could be a fix for this: Let me know as well if you think it works as expected, and if you have any questions. Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 14 replies
-
that sounds like a reasonable behavior. A queue in the process if being deleted may or may not acknowledge its remaining messages, accept consumers, and so on. I am quite sure some team members would object to deleting queue records before stopping QQ members. A new queue state sounds reasonable but there's nothing to fix fundamentally if you ask me. A queue that's in the process of being deleted won't offer much. Publisher confirms will do their job and the client can re-publish the messages as needed. |
Beta Was this translation helpful? Give feedback.
-
This is something we've discussed internally a few times in the past and I think such a two-phase deletion process would be a good change however I agree it could potentially be quite subtantial and carry some risk. |
Beta Was this translation helpful? Give feedback.
-
|
@lukebakken Regarding your question in a subthread: No, either messages are confirmed OR rejected, there is no double confirm or anything like that in my tests.
I realised the code placed into my first comment does not display the issue clearly, here is an updated version to explicitly print nacks: https://gist.github.com/luos/00f41bea6a7abd97d80e21f0b1abef5b Just to give you some idea on the use case: Sometimes there are notification messages which go to multiple users (queues), and if these notifications happen at exactly the right time (when a queue is being deleted), they are rejected by RabbitMQ. Transient exclusive could be a solution, but logging in a user is a very heavy state transfer, so that's why QQ is used. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
One lower impact option would perhaps be to delete all bindings for the queue before deleting the ra cluster, ofc the problem would remain for the default exchange but for the described fan-out scenario it may well help.