Fix ShrinkIndexIT#44214
Fix ShrinkIndexIT#44214original-brownbear merged 3 commits intoelastic:masterfrom original-brownbear:44164
Conversation
* Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived. * Closes #44164
|
Pinging @elastic/es-distributed |
DaveCTurner
left a comment
There was a problem hiding this comment.
Good catch.
I note that the CI issue was complaining about multiple tests failing - is this because the reset() is associated with the test after the one that broke the cluster? If so, can we catch this in the right test in future by asserting in stopRandomDataNode that it isn't stopping the last-remaining shared master-eligible node (if autoManageMasterNodes == true at least)?
Also can we fix this while keeping a suite-wide cluster by choosing a node other than the unique shared master-eligible node in the offending test?
Yes.
Sweet idea, done in 4a59fc0, that does indeed fail a lot nicer :)
Sure, also done in 4a59fc0. Just fired up a new data only node for this. As far as I can see that doesn't change the test's behavior and makes things super safe without complicating things? |
DaveCTurner
left a comment
There was a problem hiding this comment.
LGTM thanks for the second round @original-brownbear.
|
thanks @DaveCTurner ! |
* Fix ShrinkIndexIT * Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived. * Closes #44164
* The assertion added in elastic#44214 is tripped by tests running dedicated test clusters per test needlessly.This breaks existing tests like the one in elastic#44245. * Closes elastic#44245
testShrinkThenSplitWithFailedNodestops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived.@DaveCTurner can you take a look since you added that assertion in reset in 034c765? :)