Skip to content

Check exclusive queue owner before deleting a queue (backport #15276) (backport #15286)#15287

Merged
michaelklishin merged 5 commits intov4.1.xfrom
mergify/bp/v4.1.x/pr-15286
Jan 17, 2026
Merged

Check exclusive queue owner before deleting a queue (backport #15276) (backport #15286)#15287
michaelklishin merged 5 commits intov4.1.xfrom
mergify/bp/v4.1.x/pr-15286

Conversation

@mergify
Copy link
Copy Markdown

@mergify mergify bot commented Jan 16, 2026

[Why]
For a long time, there has been race condition when deleting exclusive queues - if a connection was re-established and a queue with the same name was declared, we could delete the new queue.

For example, with many MQTT consumers, if we performed a rolling restart of the cluster and the clients reconnected without any delay, after the restart, we sometimes had the expected number of connections but a lower number of queues, even though there should be a queue for each consumer.

[How]
Check that the exclusive_owner has the value we expect when requesting deletion. If the value is different, this means this is effectively a different queue (same name, but a different connection), so we should not delete it.

[Testing]
Here's an example of how to test before/after:

  1. With MQTT QoS0 queue type:
make start-cluster RABBITMQ_ENABLED_PLUGINS="rabbitmq_management,rabbitmq_mqtt"
omq mqtt --uri mqtt://localhost:1883,mqtt://localhost:1884,mqtt://localhost:1885 -x 100 -y 100 -r 1
make restart-cluster RABBITMQ_ENABLED_PLUGINS="rabbitmq_management,rabbitmq_mqtt"
rabbitmqctl -n rabbit-1 list_queues | rg -c mqtt
  1. With classic queues:
make start-cluster RABBITMQ_ENABLED_PLUGINS="rabbitmq_management,rabbitmq_mqtt"
omq mqtt --uri mqtt://localhost:1883,mqtt://localhost:1884,mqtt://localhost:1885 -x 100 -y 100 -r 1 --mqtt-publisher-qos 1 --mqtt-consumer-qos 1
make restart-cluster RABBITMQ_ENABLED_PLUGINS="rabbitmq_management,rabbitmq_mqtt"
rabbitmqctl -n rabbit-1 list_queues | rg -c mqtt

In both cases, you will almost certainly see that once nodes are restarted, the number of published messages doesn't match the number of consumed messages. list_queues will almost certainly return fewer than 100 queues before the PR. With this PR, the number of queues and messages flowing should meet expectations.


This is an automatic backport of pull request #15276 done by Mergify.
This is an automatic backport of pull request #15286 done by Mergify.

mkuratczyk and others added 3 commits January 16, 2026 20:51
[Why]
For a long time, there has been race condition when deleting
exclusive queues - if a connection was re-established and a queue
with the same name was declared, we could delete the new queue.

For example, with many MQTT consumers, if we performed a rolling restart
of the cluster and the clients reconnected without any delay, after the
restart, we sometimes had the expected number of connections but a lower
number of queues, even though there should be a queue for each consumer.

[How]
Check that the exclusive_owner has the value we expect when requesting
deletion. If the value is different, this means this is effectively a
different queue (same name, but a different connection), so we should
not delete it.

(cherry picked from commit 31ba23a)
(cherry picked from commit 8588cee)

# Conflicts:
#	deps/rabbit/src/rabbit_db_queue.erl
(cherry picked from commit 8418f61)
(cherry picked from commit 5f5b358)
(cherry picked from commit 49ab811)
(cherry picked from commit 403c31d)
@mergify mergify bot added the conflicts label Jan 16, 2026
@mergify
Copy link
Copy Markdown
Author

mergify bot commented Jan 16, 2026

Cherry-pick of 8588cee has failed:

On branch mergify/bp/v4.1.x/pr-15286
Your branch is up to date with 'origin/v4.1.x'.

You are currently cherry-picking commit 8588ceef8.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   deps/rabbit/src/amqqueue.erl
	modified:   deps/rabbit/src/rabbit_amqqueue.erl
	modified:   deps/rabbit/test/rabbit_db_queue_SUITE.erl

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   deps/rabbit/src/rabbit_db_queue.erl

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@michaelklishin michaelklishin added this to the 4.1.8 milestone Jan 16, 2026
v4.1.x uses Khepri 0.16.0 where khepri_tx_adv:delete returns the old
single_result format ({ok, #{data := _}}), not the new many_results
format ({ok, #{Path := #{data := _}}}) introduced in Khepri 0.17.0.

The original main/v4.2.x version uses khepri_tx:does_api_comply_with/1
to handle both formats, but this function does not exist in Khepri
0.16.0.

Additionally, khepri_path:combine_with_conditions/2 is not in the
Horus allowed function list for transaction functions in Khepri 0.16.0.
Move the path computation outside the transaction function to avoid
the Horus extraction error.

Simplify the pattern matching to use only the Khepri 0.16.0 format
while preserving the fix: conditional deletion using
khepri_path:combine_with_conditions to check exclusive_owner before
deleting.
@michaelklishin michaelklishin force-pushed the mergify/bp/v4.1.x/pr-15286 branch from 668cb7d to 8e4c549 Compare January 16, 2026 23:39
@michaelklishin michaelklishin merged commit 7ad356e into v4.1.x Jan 17, 2026
544 of 547 checks passed
@michaelklishin michaelklishin deleted the mergify/bp/v4.1.x/pr-15286 branch January 17, 2026 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants