Fix federation supervisor crash during upgrade to 4.2.x on multi-node cluster#15252
Conversation
In a multi-node cluster after a rolling upgrade from below 4.2 to 4.2 supervisor `rabbit_federation_exchange_link_sup_sup` crashed because `rabbit_federation_link_sup:start_link` had arity 1 until 4.1.x. PR mirrored supervisor preserves the child definitions which still include a call with arity 1 (without the link module). To keep old child specs valid, add back a start_link/1 function in `rabbit_federation_link_sup`. Fixes rabbitmq#15239
Without the patch the test case rolling_upgrade:child_id_format fails with:
```
=== Location: [{erpc,call,1366},
{exchange_SUITE,'-child_id_format/1-fun-5-',675},
{lists,foreach_1,2310},
{exchange_SUITE,child_id_format,670},
{test_server,ts_tc,1794},
{test_server,run_test_case_eval1,1303},
{test_server,run_test_case_eval,1235}]
=== === Reason: {exception,
{noproc,
{gen_server,call,
[rabbit_federation_exchange_link_sup_sup,
which_children,infinity]}}}
```
532ae82 to
4eca000
Compare
|
The fix commit makes sense on main as well (as the legacy type spec can be preserved forever during rolling upgrades to future RabbitMQ versions) But the part in the test case that enables should I manually create two PRs? one for v4.2.x with the test enabling |
|
@gomoripeti sure, that works for me. |
|
@gomoripeti note that I have updated a comment in 04c39b3. Please submit a new PR for Thank you. |
|
ah your comment change is enlightening so in short the current PR can be automatically backported as is to 4.2 In long: At the start of the test case on the new nodes this is how plugins look like On the old nodes if secondary is 4.1.x plugins look like (plugins enable/disable commands don't work on the old nodes because of the missing plugin That is why I had to use OTOH on main where secondary is 4.2.x, plugins look like this on old nodes in the beginning of the test case So it is not necessary to enable Noting that it is not possible to enable |
|
@Mergifyio backport v4.2.x |
✅ Backports have been createdDetails
|
|
|
Fix federation supervisor crash during upgrade to 4.2.x on multi-node cluster (backport #15252)
Proposed Changes
In a multi-node cluster after a rolling upgrade from below 4.2 to 4.2
supervisor
rabbit_federation_exchange_link_sup_supcrashed becauserabbit_federation_link_sup:start_linkhad arity 1 until 4.1.x. PRmirrored supervisor preserves the child definitions which still
include a call with arity 1 (without the link module).
To keep old child specs valid, add back a start_link/1 function in
rabbit_federation_link_sup.Fixes #15239
Run the test with
Without the patch the test case rolling_upgrade:child_id_format fails with:
Types of Changes
What types of changes does your code introduce to this project?
Put an
xin the boxes that applyChecklist
Put an
xin the boxes that apply.You can also fill these out after creating the PR.
This is simply a reminder of what we are going to look for before merging your code.
CONTRIBUTING.mddocumentFurther Comments
If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution
you did and what alternatives you considered, etc.