Do not reply to pings from another cluster#21894
Merged
jasontedor merged 1 commit intoelastic:masterfrom Nov 30, 2016
Merged
Conversation
c89e6c3 to
8ecd370
Compare
abeyad
approved these changes
Nov 30, 2016
Today when sending responses to discovery pings, we unconditionally reply. Instead, this commit modifies the response handler to not reply when the cluster names do not match. This addresses a race condition identified after reducing the timeout in UnicastZenPingTests#testSimplePings. In particular, we send pings in the following way: - if not connected to the node, connect to the node and after successful handshake, send a ping - if connected to the node, send a ping When the ping timeout is set low, a subsequent batch of pings can race against a connect/disconnect cycle from a prior batch of pings. In particular, consider the following scenario: - node A from cluster X - node B from cluster Y - pings are initiated from node A with node B in the hosts list - node A will try to connect and handshake with B - the connection will succeed, and the handshake will eventually fail due to mismatched cluster names - on a short timeout, a second batch of pings will fire, and on this batch node A will see that it is still connected to node B; thus, it will immediately fire a ping to node B and node B will dutifully respond
8ecd370 to
9738584
Compare
jasontedor
added a commit
that referenced
this pull request
Nov 30, 2016
Today when sending responses to discovery pings, we unconditionally reply. Instead, this commit modifies the response handler to not reply when the cluster names do not match. This addresses a race condition identified after reducing the timeout in UnicastZenPingTests#testSimplePings. In particular, we send pings in the following way: - if not connected to the node, connect to the node and after successful handshake, send a ping - if connected to the node, send a ping When the ping timeout is set low, a subsequent batch of pings can race against a connect/disconnect cycle from a prior batch of pings. In particular, consider the following scenario: - node A from cluster X - node B from cluster Y - pings are initiated from node A with node B in the hosts list - node A will try to connect and handshake with B - the connection will succeed, and the handshake will eventually fail due to mismatched cluster names - on a short timeout, a second batch of pings will fire, and on this batch node A will see that it is still connected to node B; thus, it will immediately fire a ping to node B and node B will dutifully respond Relates #21894
s1monw
reviewed
Nov 30, 2016
| if (node.equals(localNode)) { | ||
| return localNode; | ||
| } | ||
| logger.trace("connecting with node [{}] to perform handshake", node); |
s1monw
reviewed
Nov 30, 2016
| channel.sendResponse(handlePingRequest(request)); | ||
| if (request.pingResponse.clusterName().equals(clusterName)) { | ||
| channel.sendResponse(handlePingRequest(request)); | ||
| } |
Contributor
There was a problem hiding this comment.
should we throw an exception instead? The channel won't be closed otherwise?
Contributor
|
@jasontedor I added a bunch of comments I don't think we should not reply |
jasontedor
added a commit
that referenced
this pull request
Nov 30, 2016
Today when sending responses to discovery pings, we unconditionally reply. Instead, this commit modifies the response handler to not reply when the cluster names do not match. This addresses a race condition identified after reducing the timeout in UnicastZenPingTests#testSimplePings. In particular, we send pings in the following way: - if not connected to the node, connect to the node and after successful handshake, send a ping - if connected to the node, send a ping When the ping timeout is set low, a subsequent batch of pings can race against a connect/disconnect cycle from a prior batch of pings. In particular, consider the following scenario: - node A from cluster X - node B from cluster Y - pings are initiated from node A with node B in the hosts list - node A will try to connect and handshake with B - the connection will succeed, and the handshake will eventually fail due to mismatched cluster names - on a short timeout, a second batch of pings will fire, and on this batch node A will see that it is still connected to node B; thus, it will immediately fire a ping to node B and node B will dutifully respond Relates #21894
Member
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Today when sending responses to discovery pings, we unconditionally
reply. Instead, this commit modifies the response handler to not reply
when the cluster names do not match.
This addresses a race condition identified after reducing the timeout in
UnicastZenPingTests#testSimplePings. In particular, we send pings in the
following way:
successful handshake, send a ping
When the ping timeout is set low, a subsequent batch of pings can race
against a connect/disconnect cycle from a prior batch of pings. In
particular, consider the following scenario:
batch node A will see that it is still connected to node B; thus, it
will immediately fire a ping to node B and node B will dutifully
respond
Relates #21874