Prevent federation links from restarting during node shutdown#15258
Merged
michaelklishin merged 4 commits intomainfrom Jan 14, 2026
Merged
Prevent federation links from restarting during node shutdown#15258michaelklishin merged 4 commits intomainfrom
michaelklishin merged 4 commits intomainfrom
Conversation
or plugin shutdown, for that matter. With this guardrail in place, nodes with hundreds or thousands of federation links will avoid potentially significant shutdown delays that have to do with links being restarted while the node as a whole is preparing to shut down. Per discussion with @dcorbacho @ansd.
ansd
reviewed
Jan 14, 2026
deps/rabbitmq_exchange_federation/src/rabbit_federation_exchange_link.erl
Show resolved
Hide resolved
ansd
requested changes
Jan 14, 2026
Member
ansd
left a comment
There was a problem hiding this comment.
I tried this PR out using a single node as follows:
make run-broker
./sbin/rabbitmq-plugins enable rabbitmq_exchange_federation
./sbin/rabbitmqctl set_parameter federation-upstream origin '{"uri":"amqp://localhost:5672"}'
./sbin/rabbitmqctl set_policy exchange-federation "^amq.direct" '{"federation-upstream-set":"all"}' --priority 10 --apply-to exchanges
./sbin/rabbitmqctl stopStopping RabbitMQ as above errors:
2026-01-14 09:14:38.488679+01:00 [info] <0.860.0> RabbitMQ is asked to stop...
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> Stopping RabbitMQ applications and their dependencies in the following order:
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> rabbitmq_exchange_federation
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> rabbitmq_management
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> rabbitmq_management_agent
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> rabbitmq_web_dispatch
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> rabbitmq_federation_common
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> rabbit
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> khepri
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> ra
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> cowboy
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> oauth2_client
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> sysmon_handler
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> rabbitmq_prelaunch
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> osiris
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> amqp_client
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> rabbit_common
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> jose
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> os_mon
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0> mnesia
2026-01-14 09:14:38.514449+01:00 [info] <0.860.0>
2026-01-14 09:14:38.514536+01:00 [info] <0.860.0> Stopping application 'rabbitmq_exchange_federation'
2026-01-14 09:14:43.515924+01:00 [error] <0.672.0> application_master: shutdown_error
2026-01-14 09:14:43.515924+01:00 [error] <0.672.0> rabbit_exchange_federation_app: {prep_stop,[[]]}
2026-01-14 09:14:43.515924+01:00 [error] <0.672.0> error_info: {timeout,
2026-01-14 09:14:43.515924+01:00 [error] <0.672.0> {gen_server,call,
2026-01-14 09:14:43.515924+01:00 [error] <0.672.0> [application_controller,
2026-01-14 09:14:43.515924+01:00 [error] <0.672.0> {set_env,rabbitmq_federation_common,shutting_down,
2026-01-14 09:14:43.515924+01:00 [error] <0.672.0> true,[]}]}}
2026-01-14 09:14:43.516567+01:00 [debug] <0.672.0> Stopping pg scope rabbitmq_exchange_federation_pg_scope
2026-01-14 09:14:43.519106+01:00 [alert] <0.672.0> Member <0.697.0> stopped: normal
2026-01-14 09:14:43.519291+01:00 [info] <0.724.0> closing AMQP connection (127.0.0.1:61862 -> 127.0.0.1:5672 - Federation link (upstream: origin, policy: exchange-federation), vhost: '/', user: 'guest', duration: '1M, 2s')
2026-01-14 09:14:43.520981+01:00 [notice] <0.45.0> Application rabbitmq_exchange_federation exited with reason: stopped
2026-01-14 09:14:43.521082+01:00 [info] <0.860.0> Stopping application 'rabbitmq_management'
2026-01-14 09:14:43.523838+01:00 [warning] <0.474.0> HTTP listener registry could not find context rabbitmq_management_tls
2026-01-14 09:14:43.525296+01:00 [notice] <0.45.0> Application rabbitmq_management exited with reason: stopped
2026-01-14 09:14:43.525471+01:00 [info] <0.860.0> Stopping application 'rabbitmq_management_agent'
2026-01-14 09:14:43.527990+01:00 [notice] <0.45.0> Application rabbitmq_management_agent exited with reason: stopped
2026-01-14 09:14:43.528065+01:00 [info] <0.860.0> Stopping application 'rabbitmq_web_dispatch'
2026-01-14 09:14:43.529537+01:00 [notice] <0.45.0> Application rabbitmq_web_dispatch exited with reason: stopped
2026-01-14 09:14:43.529602+01:00 [info] <0.860.0> Stopping application 'rabbitmq_federation_common'
2026-01-14 09:14:43.530772+01:00 [notice] <0.45.0> Application rabbitmq_federation_common exited with reason: stopped
2026-01-14 09:14:43.530803+01:00 [info] <0.860.0> Stopping application 'rabbit'
2026-01-14 09:14:43.530847+01:00 [debug] <0.217.0> Change boot state to `stopping`
This avoids a classic deadlock in Erlang: when an application_controller (AC) invokes a callback, such as pre_stop/1, the function invoked cannot use any OTP functions that would ultimately require an AC response. application:set_env/2 is one of such functions, so with this commit we switch to a persistent term.
Collaborator
Author
|
I have re-created the branch to make GitHub pick up on commit a8347b2. |
ansd
approved these changes
Jan 14, 2026
mergify bot
pushed a commit
that referenced
this pull request
Jan 14, 2026
(cherry picked from commit 80c8d7b)
michaelklishin
added a commit
that referenced
this pull request
Jan 14, 2026
This change guards against significant shutdown delays in nodes managing hundreds or thousands of federation links that would otherwise restart while the node prepares to shut down. Uses a persistent term in rabbit_federation_app_state to avoid a classic Erlang deadlock scenario where an application_controller invokes callbacks like prep_stop/1. Also makes forget_binding/2 more defensive by handling the case where a binding key is not found in the map. Backport of #15258 to v4.1.x.
michaelklishin
added a commit
that referenced
this pull request
Jan 14, 2026
Prevent federation links from restarting during node shutdown (backport #15258)
michaelklishin
added a commit
that referenced
this pull request
Feb 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
or plugin shutdown, for that matter.
With this guardrail in place, nodes with hundreds or thousands of federation links will avoid potentially significant shutdown delays that have to do with
links being restarted while the node as a whole is preparing to shut down.
This state is node-local, as is the shutdown state, so this will not prevent links migrating between nodes (under
mirrored_supervisor) from starting.Per discussion with @dcorbacho @ansd.
Note that this PR cannot be backported exactly to
v4.1.xand earlier branches. The federation plugin split inmainfirst shipped in4.2.0.