Connect to new nodes concurrently#22984
Conversation
s1monw
left a comment
There was a problem hiding this comment.
I left one question... lgtm otherwise
|
|
||
| @Override | ||
| public void disconnectFromNodesExcept(Iterable<DiscoveryNode> nodesToKeep) { | ||
| public void disconnectFromNodesExcept(DiscoveryNodes nodesToKeep) { |
There was a problem hiding this comment.
why do we change these? Isn't Iterable more generic and would still accept DiscoveryNodes? also disconnectFromNodesExcept sounds like it would only be executed on a subset of and actual DiscoveryNodes instance. I think we should stick with what we had?
There was a problem hiding this comment.
why do we change these? Isn't Iterable more generic and would still accept DiscoveryNodes?
I can change it back, but I did it to be consistent with connectToNodes which needs the total size of the collection. Since this is only used with DiscoNodes, I felt consistency is clearer. As said - I'll roll it back if you prefer.
disconnectFromNodesExcept sounds like it would only be executed on a subset of and actual DiscoveryNodes instance
Naming are hard. What it does is disconnecting from all nodes that are not part of the supplied disconodes parameter. We pass the disco nodes of the current cluster state directly, without any modification.
|
thx @s1monw |
When a node receives a new cluster state from the master, it opens up connections to any new node in the cluster state. That has always been done serially on the cluster state thread but it has been a long standing TODO to do this concurrently, which is done by this PR. This is spin off of #22828, where an extra handshake is done whenever connecting to a node, which may slow down connecting. Also, the handshake is done in a blocking fashion which triggers assertions w.r.t blocking requests on the cluster state thread. Instead of adding an exception, I opted to implement concurrent connections which both side steps the assertion and compensates for the extra handshake.
When a node receives a new cluster state from the master, it opens up connections to any new node in the cluster state. That has always been done serially on the cluster state thread but it has been a long standing TODO to do this concurrently, which is done by this PR.
This is spin off of #22828, where an extra handshake is done whenever connecting to a node, which may slow down connecting. Also, the handshake is done in a blocking fashion which triggers assertions w.r.t blocking requests on the cluster state thread. Instead of adding an exception, I opted to implement concurrent connections which both side steps the assertion and compensates for the extra handshake.
The change is well tested under current tests. Sadly, I could find an easy way to simulate rejections as we have locked down the thread pool settings (good!) and the management still has an unbound queue.