Skip to content

Make shard store fetch less dependent on the current cluster state, both on master and non data nodes#19044

Merged
bleskes merged 11 commits intoelastic:masterfrom
bleskes:async_fetch_use_nodes
Jun 27, 2016
Merged

Make shard store fetch less dependent on the current cluster state, both on master and non data nodes#19044
bleskes merged 11 commits intoelastic:masterfrom
bleskes:async_fetch_use_nodes

Conversation

@bleskes
Copy link
Copy Markdown
Contributor

@bleskes bleskes commented Jun 23, 2016

#18938 has changed the timing in which we send out to nodes to fetch their shard stores. Instead of doing this after the cluster state resulting of the node's join was published, #18938 made it be sent concurrently to the publishing processes. This revealed a couple of points where the shard store fetching is dependent of the current state of affairs of the cluster state, both on the master and the data nodes. The problem discovered were already present without #18938 but required a failure/extreme situations to make them happen.This PR tries to remove as much as possible of these dependencies making shard store fetching simpler and make the way to re-introduce #18938 which was reverted.

These are the notable changes:

  1. Allow TransportNodesAction (of which shard store fetching is derived) callers to supply concrete disco nodes, so it won't need the cluster state to resolve them. This was a problem because the cluster state containing the needed nodes was not yet made available through ClusterService. Note that long term we can expect the rest layer to resolve node ids to concrete nodes, making this mode the only one needed.
  2. The data node relied on the cluster state to have the relevant index meta data so it can find data when custom paths are used. We now fall back to read the meta data from disk if needed.
  3. The data node was relying on it's own IndexService state to indicate whether the data it has corresponds to an existing allocation. This is of course something it can not know until it got (and processed) the new cluster state from the master. This flag in the response is now removed. This is not a problem because we used that flag to protect against double assigning of a shard to the same node, but we are already protected from it by the allocation deciders.
  4. I removed the redundant filterNodeIds method in TransportNodesAction - if people want to filter they can override resolveRequest.

public static String[] ALL_NODES = Strings.EMPTY_ARRAY;

/**
* the list of nodesIds that will be used to resolve this request, once resolved this array will by nulled and {@link #concreteNodes}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/will by/will be/

@ywelsch
Copy link
Copy Markdown
Contributor

ywelsch commented Jun 24, 2016

Left minor comments. Also there is a word missing in the title ;-)

@bleskes bleskes changed the title Make shard store fetch less dependent on the current cluster state, both on master and non nodes Make shard store fetch less dependent on the current cluster state, both on master and non data nodes Jun 24, 2016
@bleskes
Copy link
Copy Markdown
Contributor Author

bleskes commented Jun 25, 2016

@ywelsch thx. I pushed another commit

@ywelsch
Copy link
Copy Markdown
Contributor

ywelsch commented Jun 27, 2016

Left one more comment (to get rid of a line of code). Feel free to push once addressed. LGTM

@bleskes bleskes merged commit cb0824e into elastic:master Jun 27, 2016
@bleskes bleskes deleted the async_fetch_use_nodes branch June 27, 2016 13:05
@bleskes
Copy link
Copy Markdown
Contributor Author

bleskes commented Jun 27, 2016

thanks @ywelsch ! I added the assertion and pushed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. resiliency v5.0.0-alpha5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants