Ensure cluster is stable in ShrinkIndexIT.testShrinkThenSplitWithFailedNode#44860
Merged
tlrx merged 2 commits intoelastic:masterfrom Jul 26, 2019
Merged
Ensure cluster is stable in ShrinkIndexIT.testShrinkThenSplitWithFailedNode#44860tlrx merged 2 commits intoelastic:masterfrom
tlrx merged 2 commits intoelastic:masterfrom
Conversation
Collaborator
|
Pinging @elastic/es-distributed |
original-brownbear
approved these changes
Jul 25, 2019
Contributor
original-brownbear
left a comment
There was a problem hiding this comment.
LGTM, this makes perfect sense. When I investigated this the failure situation was the only time where the shutdown of the node didn't fully go through before the next CS update I think.
|
|
||
| final int nodeCount = cluster().size(); | ||
| internalCluster().stopRandomNode(InternalTestCluster.nameFilter(shrinkNode)); | ||
| ensureStableCluster(nodeCount -1); |
Contributor
There was a problem hiding this comment.
NIT: formatting of the -1 is missing a space - 1 :)
DaveCTurner
approved these changes
Jul 25, 2019
Member
DaveCTurner
left a comment
There was a problem hiding this comment.
Yep seems reasonable to me too. Good catch.
Member
Author
|
@elasticmachine run elasticsearch-ci/2 |
Member
Author
|
Thanks @original-brownbear and @DaveCTurner |
tlrx
added a commit
that referenced
this pull request
Jul 26, 2019
…edNode (#44860) The test ShrinkIndexIT.testShrinkThenSplitWithFailedNode sometimes fails because the resize operation is not acknowledged (see #44736). This resize operation creates a new index "splitagain" and it results in a cluster state update (TransportResizeAction uses MetaDataCreateIndexService.createIndex() to create the resized index). This cluster state update is expected to be acknowledged by all nodes (see IndexCreationTask.onAllNodesAcked()) but this is not always true: the data node that was just stopped in the test before executing the resize operation might still be considered as a "faulty" node (and not yet removed from the cluster nodes) by the FollowersChecker. The cluster state is then acked on all nodes but one, and it results in a non acknowledged resize operation. This commit adds an ensureStableCluster() check after stopping the node in the test. The goal is to ensure that the data node has been correctly removed from the cluster and that all nodes are fully connected to each before moving forward with the resize operation. Closes #44736
tlrx
added a commit
that referenced
this pull request
Jul 26, 2019
…edNode (#44860) The test ShrinkIndexIT.testShrinkThenSplitWithFailedNode sometimes fails because the resize operation is not acknowledged (see #44736). This resize operation creates a new index "splitagain" and it results in a cluster state update (TransportResizeAction uses MetaDataCreateIndexService.createIndex() to create the resized index). This cluster state update is expected to be acknowledged by all nodes (see IndexCreationTask.onAllNodesAcked()) but this is not always true: the data node that was just stopped in the test before executing the resize operation might still be considered as a "faulty" node (and not yet removed from the cluster nodes) by the FollowersChecker. The cluster state is then acked on all nodes but one, and it results in a non acknowledged resize operation. This commit adds an ensureStableCluster() check after stopping the node in the test. The goal is to ensure that the data node has been correctly removed from the cluster and that all nodes are fully connected to each before moving forward with the resize operation. Closes #44736
tlrx
added a commit
that referenced
this pull request
Jul 26, 2019
…edNode (#44860) The test ShrinkIndexIT.testShrinkThenSplitWithFailedNode sometimes fails because the resize operation is not acknowledged (see #44736). This resize operation creates a new index "splitagain" and it results in a cluster state update (TransportResizeAction uses MetaDataCreateIndexService.createIndex() to create the resized index). This cluster state update is expected to be acknowledged by all nodes (see IndexCreationTask.onAllNodesAcked()) but this is not always true: the data node that was just stopped in the test before executing the resize operation might still be considered as a "faulty" node (and not yet removed from the cluster nodes) by the FollowersChecker. The cluster state is then acked on all nodes but one, and it results in a non acknowledged resize operation. This commit adds an ensureStableCluster() check after stopping the node in the test. The goal is to ensure that the data node has been correctly removed from the cluster and that all nodes are fully connected to each before moving forward with the resize operation. Closes #44736
jkakavas
pushed a commit
that referenced
this pull request
Jul 31, 2019
…edNode (#44860) The test ShrinkIndexIT.testShrinkThenSplitWithFailedNode sometimes fails because the resize operation is not acknowledged (see #44736). This resize operation creates a new index "splitagain" and it results in a cluster state update (TransportResizeAction uses MetaDataCreateIndexService.createIndex() to create the resized index). This cluster state update is expected to be acknowledged by all nodes (see IndexCreationTask.onAllNodesAcked()) but this is not always true: the data node that was just stopped in the test before executing the resize operation might still be considered as a "faulty" node (and not yet removed from the cluster nodes) by the FollowersChecker. The cluster state is then acked on all nodes but one, and it results in a non acknowledged resize operation. This commit adds an ensureStableCluster() check after stopping the node in the test. The goal is to ensure that the data node has been correctly removed from the cluster and that all nodes are fully connected to each before moving forward with the resize operation. Closes #44736
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The test
ShrinkIndexIT.testShrinkThenSplitWithFailedNodesometimes fails because the resize operation is not acknowledged (see #44736). This resize operation creates a new index "splitagain" and it results in a cluster state update (TransportResizeAction uses MetaDataCreateIndexService.createIndex() to create the resized index). This cluster state update is expected to be acknowledged by all nodes (see IndexCreationTask.onAllNodesAcked()) but this is not always true: the data node that was just stopped in the test before executing the resize operation might still be considered as a "faulty" node (and not yet removed from the cluster nodes) by theFollowersChecker. The cluster state is then acked on all nodes but one, and it results in a non acknowledged resize operation.This pull request adds an ensureStableCluster() check after stopping the node in the test. The goal is to ensure that the data node has been correctly removed from the cluster and that all nodes are fully connected to each before moving forward with the resize operation.
Closes #44736