Forbid peer to join cluster with URI that is already used#7375
Merged
Conversation
f2b2bf7 to
12e42af
Compare
cf2616f to
c18ed5e
Compare
agourlay
approved these changes
Oct 13, 2025
Member
agourlay
left a comment
There was a problem hiding this comment.
Thanks for the test 👍
Not completely happy about the failure through panic but I guess it is acceptable in this rare case.
ffuugoo
approved these changes
Oct 13, 2025
timvisee
added a commit
that referenced
this pull request
Nov 14, 2025
* Disallow peer to join with URI that is already used * Add test for rejecting peer join with duplicate URI * Improve peer rejection logic * Try to rejoin twice, we expect a consistent result
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Multiple peers joining with the same cluster URL result in a whole myriad of problems. We should therefore forbid that as a safety measure.
This changes our consensus logic to reject adding a peer with an URI that is already used by some other peer. The peer fails to join and crashes.
It is easy to trigger this in practice. You can wipe the storage directory. On restart the peer will try to join a second time with the same URL.
If something like this ever happens in our cloud, it means that the peer will start a crash loop. This is desired, so the issue gets attention from our support team. Their resolution options are to recover the broken node, or to remove the broken node from consensus first after which the peer can rejoin just fine.
Rejoining on an existing URI with the same peer ID is allowed.
A test is included to assert the behavior.
When this occurs, the leader reports the following:
The peer that tries (and fails) to rejoin crashes and reports the following. It is not very nice, but I couldn't find a good way to report a proper message here. I don't think it's worth a lot of effort.
All Submissions:
devbranch. Did you create your branch fromdev?New Feature Submissions:
cargo +nightly fmt --allcommand prior to submission?cargo clippy --all --all-featurescommand?