Only connect to new nodes on new cluster state#31547
Only connect to new nodes on new cluster state#31547DaveCTurner wants to merge 6 commits intoelastic:masterfrom
Conversation
Today, when a new cluster state is committed we attempt to connect to all of its nodes as part of the application process. This is the right thing to do with new nodes, and is a no-op on any already-connected nodes, but is questionable on known nodes from which we are currently disconnected: there is a risk that we are partitioned from these nodes so that any attempt to connect to them will hang until it times out. This can dramatically slow down the application of new cluster states which hinders the recovery of the cluster during certain kinds of partition. If nodes are disconnected from the master then it is likely that they are to be removed as part of a subsequent cluster state update, so there's no need to try and reconnect to them like this. Moreover there is no need to attempt to reconnect to disconnected nodes as part of the cluster state application process, because we periodically try and reconnect to any disconnected nodes, and handle their disconnectedness gracefully in the meantime. This commit alters this behaviour to avoid reconnecting to known nodes during cluster state application. Resolves elastic#29025.
|
Pinging @elastic/es-core-infra |
|
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
I think this is a worthwhile improvement on the current situation, and it's a simple change, but I think it's worth revisiting in future since there's still room for improvement IMO: if the ConnectionChecker happens to be running when a partition is detected then it can block the cluster state update thread from getting hold of the nodeLocks that it needs for an extended period of time. Reducing the connection timeout (#29022) would help a bit here.
| try (Releasable ignored = nodeLocks.acquire(node)) { | ||
| nodes.putIfAbsent(node, 0); | ||
| connected = transportService.nodeConnected(node); | ||
| connected = (nodes.putIfAbsent(node, 0) != null); |
There was a problem hiding this comment.
can we avoid going on the management thread if we are truly connected? This will be relevant on the master where where the node is connected to during join validation. Also, I think the name connected for the variable is a bit mislead now. Maybe flip it and call it shouldConnect?
It will also good to have a comment explaining the rational for this check.
PS - as discussed, it will be good to explore using this component on non-elected-master nodes which will make my condition tweak obsolete.
ywelsch
left a comment
There was a problem hiding this comment.
I'm a bit on the fence for this change. Yes, it will help in the situation described, but I wonder what adversarial effects it will have after a network partition (We work around these adversarial effects in our tests by explicitly reconnecting the nodes). I wonder if we should force reconnectToKnownNodes whenever the node applies a cluster state from a fresh master. Yes, this will slow down publishing a little, but only the first cluster state that's received from the new master, not subsequent ones.
| client(node).prepareGet("test", "type", id).setPreference("_local").get().isExists()); | ||
| } | ||
| } catch (AssertionError | NoShardAvailableActionException e) { | ||
| } catch (AssertionError | NoShardAvailableActionException | NodeNotConnectedException e) { |
There was a problem hiding this comment.
Is this change still required?
|
I've decided to close this. It mostly helps, but with this approach there's still a chance of blocking a cluster state update on a connection to a blackhole because of the node locks, and I think it'd be better for future debugging/analysis if this wasn't based on luck. |
Today, when applying new cluster state we attempt to connect to all of its nodes as a blocking part of the application process. This is the right thing to do with new nodes, and is a no-op on any already-connected nodes, but is questionable on known nodes from which we are currently disconnected: there is a risk that we are partitioned from these nodes so that any attempt to connect to them will hang until it times out. This can dramatically slow down the application of new cluster states which hinders the recovery of the cluster during certain kinds of partition. If nodes are disconnected from the master then it is likely that they are to be removed as part of a subsequent cluster state update, so there's no need to try and reconnect to them like this. Moreover there is no need to attempt to reconnect to disconnected nodes as part of the cluster state application process, because we periodically try and reconnect to any disconnected nodes, and handle their disconnectedness reasonably gracefully in the meantime. This commit alters this behaviour to avoid reconnecting to known nodes during cluster state application. Resolves elastic#29025. Supersedes elastic#31547.
Today, when a new cluster state is committed we attempt to connect to all of
its nodes as part of the application process. This is the right thing to do
with new nodes, and is a no-op on any already-connected nodes, but is
questionable on known nodes from which we are currently disconnected: there is
a risk that we are partitioned from these nodes so that any attempt to connect
to them will hang until it times out. This can dramatically slow down the
application of new cluster states which hinders the recovery of the cluster
during certain kinds of partition.
If nodes are disconnected from the master then it is likely that they are to be
removed as part of a subsequent cluster state update, so there's no need to try
and reconnect to them like this. Moreover there is no need to attempt to
reconnect to disconnected nodes as part of the cluster state application
process, because we periodically try and reconnect to any disconnected nodes,
and handle their disconnectedness gracefully in the meantime.
This commit alters this behaviour to avoid reconnecting to known nodes during
cluster state application.
Resolves #29025.