rabbitmq_*_federation: Stop links during plugin stop#14054
Merged
Conversation
[Why]
Links are started by the plugins but put under the `rabbit` supervision
tree. The federation plugins supervision tree is empty unfortunately...
Links are stopped by a boot step executed by `rabbit`, as a concequence
of unregistering the plugins' parameters.
Unfortunately, links can be terminated if the channel, and implicitly
the connection stops. This happens when the `amqp_client` application
stops.
We end up with a race here:
* Because the federation plugins supervision trees are empty and the
application stop functions barely stop the pg group (which doesn't
terminate the group members), nothing waits for the links to stop.
Therefore, `rabbit` can stop `amqp_client' which is a dependency of
the federation plugins. Therefore, the links underlying channels and
connections are stopped.
* `rabbit` unregister the federation parameters, terminating the links.
The exchange links `terminate/2` function needs the channel to delete
the remote queue. But the channel and the underlying connection might
be gone.
This simply logs a `badmatch` exception:
[error] <0.884.0> Federation link could not create a disposable (one-off) channel due to an error error: {badmatch,
[error] <0.884.0> {error,
[error] <0.884.0> {noproc,
[error] <0.884.0> {gen_server,
[error] <0.884.0> call,
[error] <0.884.0> [<0.911.0>,
[error] <0.884.0> {command,
[error] <0.884.0> {open_channel,
[error] <0.884.0> none,
[error] <0.884.0> {amqp_selective_consumer,
[error] <0.884.0> []}}},
[error] <0.884.0> 130000]}}}}
[How]
The solution is to make sure links are stopped as part of the stop of
the plugins.
`rabbit_federation_pg:stop_scope/1` is expanded to stop all members of
all groups in this scope, before terminating the pg scope itself. The
new code waits for the stopped processes to exit.
We have to handle the `EXIT` signal in the link processes and change
their restart strategy in their parent supervisor from permanent to
transient. This ensures they are restarted only if they crash. This also
skips a error log message about each stopped link.
bdf095c to
033ab45
Compare
dcorbacho
approved these changes
Jun 11, 2025
ansd
added a commit
that referenced
this pull request
Jan 15, 2026
## What? Federation links started in the federation plugins are put under the `rabbit` app supervision tree (unfortunately). This commit ensures that the entire federation supervision hierarchies (including all federation links) are stopped **before** stopping app `rabbit` when stopping RabbittMQ. ## Why? Previously, we've seen cases where hundreds of federation links are stopped during the shutdown procedure in app `rabbit` leading to federation link restarts happening in parallel to vhosts being stopped. In one case, the shutdown of app `rabbit` even got stuck (although there is no evidence that federation was the problem). Either way, the cleaner appraoch is to gracefully stop all federation links, i.e. the entire supervision hierarchy under `rabbit_exchange_federation_sup` and `rabbit_queue_federation_sup` when stopping the federation apps, i.e. **before** proceeding to stop app `rabbit`. ## How? The boot step cleanup steps for the federation plugins are skipped when stopping RabbitMQ. Hence, this commit ensures that the supervisors are stopped in the stop/1 application callback. This commit does something similar to #14054 but uses a simpler approach.
michaelklishin
pushed a commit
that referenced
this pull request
Jan 16, 2026
## What? Federation links started in the federation plugins are put under the `rabbit` app supervision tree (unfortunately). This commit ensures that the entire federation supervision hierarchies (including all federation links) are stopped **before** stopping app `rabbit` when stopping RabbittMQ. ## Why? Previously, we've seen cases where hundreds of federation links are stopped during the shutdown procedure in app `rabbit` leading to federation link restarts happening in parallel to vhosts being stopped. In one case, the shutdown of app `rabbit` even got stuck (although there is no evidence that federation was the problem). Either way, the cleaner appraoch is to gracefully stop all federation links, i.e. the entire supervision hierarchy under `rabbit_exchange_federation_sup` and `rabbit_queue_federation_sup` when stopping the federation apps, i.e. **before** proceeding to stop app `rabbit`. ## How? The boot step cleanup steps for the federation plugins are skipped when stopping RabbitMQ. Hence, this commit ensures that the supervisors are stopped in the stop/1 application callback. This commit does something similar to #14054 but uses a simpler approach. (cherry picked from commit 8bffa58)
mergify bot
pushed a commit
that referenced
this pull request
Jan 16, 2026
## What? Federation links started in the federation plugins are put under the `rabbit` app supervision tree (unfortunately). This commit ensures that the entire federation supervision hierarchies (including all federation links) are stopped **before** stopping app `rabbit` when stopping RabbittMQ. ## Why? Previously, we've seen cases where hundreds of federation links are stopped during the shutdown procedure in app `rabbit` leading to federation link restarts happening in parallel to vhosts being stopped. In one case, the shutdown of app `rabbit` even got stuck (although there is no evidence that federation was the problem). Either way, the cleaner appraoch is to gracefully stop all federation links, i.e. the entire supervision hierarchy under `rabbit_exchange_federation_sup` and `rabbit_queue_federation_sup` when stopping the federation apps, i.e. **before** proceeding to stop app `rabbit`. ## How? The boot step cleanup steps for the federation plugins are skipped when stopping RabbitMQ. Hence, this commit ensures that the supervisors are stopped in the stop/1 application callback. This commit does something similar to #14054 but uses a simpler approach. (cherry picked from commit 8bffa58) (cherry picked from commit 512553e) # Conflicts: # deps/rabbitmq_federation_common/src/rabbit_federation_pg.erl
michaelklishin
pushed a commit
that referenced
this pull request
Feb 24, 2026
## What? Federation links started in the federation plugins are put under the `rabbit` app supervision tree (unfortunately). This commit ensures that the entire federation supervision hierarchies (including all federation links) are stopped **before** stopping app `rabbit` when stopping RabbittMQ. ## Why? Previously, we've seen cases where hundreds of federation links are stopped during the shutdown procedure in app `rabbit` leading to federation link restarts happening in parallel to vhosts being stopped. In one case, the shutdown of app `rabbit` even got stuck (although there is no evidence that federation was the problem). Either way, the cleaner appraoch is to gracefully stop all federation links, i.e. the entire supervision hierarchy under `rabbit_exchange_federation_sup` and `rabbit_queue_federation_sup` when stopping the federation apps, i.e. **before** proceeding to stop app `rabbit`. ## How? The boot step cleanup steps for the federation plugins are skipped when stopping RabbitMQ. Hence, this commit ensures that the supervisors are stopped in the stop/1 application callback. This commit does something similar to #14054 but uses a simpler approach. (cherry picked from commit 8bffa58)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Links are started by the plugins but put under the
rabbitsupervision tree. The federation plugins supervision tree is empty unfortunately...Links are stopped by a boot step executed by
rabbit, as a consequence of unregistering the plugins' parameters.Unfortunately, links can be terminated if the channel, and implicitly the connection stops. This happens when the
amqp_clientapplication stops.We end up with a race here:
Because the federation plugins supervision trees are empty and the application stop functions barely stop the pg group (which doesn't terminate the group members), nothing waits for the links to stop. Therefore,
rabbitcan stop `amqp_client' which is a dependency of the federation plugins. Therefore, the links underlying channels and connections are stopped.rabbitunregister the federation parameters, terminating the links. The exchange linksterminate/2function needs the channel to delete the remote queue. But the channel and the underlying connection might be gone.This simply logs a
badmatchexception:How
The solution is to make sure links are stopped as part of the stop of the plugins.
rabbit_federation_pg:stop_scope/1is expanded to stop all members of all groups in this scope, before terminating the pg scope itself. The new code waits for the stopped processes to exit.We have to handle the
EXITsignal in the link processes and change their restart strategy in their parent supervisor from permanent to transient. This ensures they are restarted only if they crash. This also skips a error log message about each stopped link.