Sometimes relax lag detector in CoordinatorTests#140434
Sometimes relax lag detector in CoordinatorTests#140434DaveCTurner merged 3 commits intoelastic:mainfrom
CoordinatorTests#140434Conversation
Related to ES-10778: since we're running with a relaxed lag detector in some environments now we really should be covering this in the test suite to verify that the lag detector isn't required for liveness. Relates elastic#108690
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
| assertThat("leader should be last to ack", ackCollector.getSuccessfulAckIndex(leader), equalTo(1)); | ||
|
|
||
| follower0.setClusterStateApplyResponse(ClusterStateApplyResponse.SUCCEED); | ||
| leader.submitValue(randomLong()); // follower0 acks next value allowing cluster to stabilise |
There was a problem hiding this comment.
Without the lag detector and absent any further cluster state updates the lagging node remains lagging, but that's what we want. The point here is that if another cluster state update happens then it stops lagging again.
| cluster.runFor(DEFAULT_STABILISATION_TIME, "allowing new leader election"); | ||
| cluster.getAnyLeader().submitValue(randomLong()); // old leader acks this value allowing cluster to stabilise | ||
| cluster.stabilise(DEFAULT_CLUSTER_STATE_UPDATE_DELAY); |
There was a problem hiding this comment.
Likewise here: stabilising the cluster will not do anything to the lagging node, we need to wait to allow for a master election, then do another cluster state update, to bring the node back in sync.
There was a problem hiding this comment.
So if there is a relaxed lag detector, we need to wait for the cluster to stabilise again. If there is not a relaxed lag detector, would we be in this state already?
There was a problem hiding this comment.
DEFAULT_CLUSTER_STATE_UPDATE_DELAY is short, just a few round-trips, so this doesn't add much more time.
In practice DEFAULT_STABILISATION_TIME is long enough for the lag detector to do its thing when it's not in relaxed mode.
| .put( | ||
| LagDetector.CLUSTER_FOLLOWER_LAG_TIMEOUT_SETTING.getKey(), | ||
| TimeValue.timeValueMillis(defaultMillis(LagDetector.CLUSTER_FOLLOWER_LAG_TIMEOUT_SETTING)) | ||
| ) |
There was a problem hiding this comment.
This test is the only one that actually needs the lag detector, because we are asserting that it logs what it logs.
| defaultMillis(PUBLISH_TIMEOUT_SETTING) + 2 * DEFAULT_DELAY_VARIABILITY + defaultMillis( | ||
| LagDetector.CLUSTER_FOLLOWER_LAG_TIMEOUT_SETTING | ||
| ) + DEFAULT_DELAY_VARIABILITY + 2 * DEFAULT_DELAY_VARIABILITY, | ||
| DEFAULT_CLUSTER_STATE_UPDATE_DELAY + defaultMillis(PUBLISH_TIMEOUT_SETTING) + 2 * DEFAULT_DELAY_VARIABILITY |
There was a problem hiding this comment.
Timeout here apparently didn't include enough time to do the publication.
This test is kinda bogus with an atomic register because it doesn't actually time out as claimed. Nonetheless, we do want to know that the nacks are genuinely delivered eventually under these conditions. This commit adjusts the test to handle the relaxed a lag detector introduced in elastic#140434.
This test is kinda bogus with an atomic register because it doesn't actually time out as claimed. Nonetheless, we do want to know that the nacks are genuinely delivered eventually under these conditions. This commit adjusts the test to handle the relaxed a lag detector introduced in elastic#140434. Closes elastic#140509
Related to ES-10778: since we're running with a relaxed lag detector in some environments now we really should be covering this in the test suite to verify that the lag detector isn't required for liveness. Relates elastic#108690
This test is kinda bogus with an atomic register because it doesn't actually time out as claimed. Nonetheless, we do want to know that the nacks are genuinely delivered eventually under these conditions. This commit adjusts the test to handle the relaxed lag detector introduced in #140434. Closes #140509
This test is kinda bogus with an atomic register because it doesn't actually time out as claimed. Nonetheless, we do want to know that the nacks are genuinely delivered eventually under these conditions. This commit adjusts the test to handle the relaxed lag detector introduced in elastic#140434. Closes elastic#140509
Today you can set an arbitrarily long timeout on the `LagDetector` but there's no facility to just completely turn it off. The usual values of `0` and `-1` that one might expect to do so are forbidden. There's no good reason for this any more, see e.g. elastic#140434, so with this commit we adjust the setting to accept nonpositive timeouts and interpret them to mean that no lag detection should take place. Relates ES-10778
Today you can set an arbitrarily long timeout on the `LagDetector` but there's no facility to just completely turn it off. The usual values of `0` and `-1` that one might expect to do so are forbidden. There's no good reason for this any more, see e.g. elastic#140434, so with this commit we adjust the setting to accept nonpositive timeouts and interpret them to mean that no lag detection should take place. Relates ES-10778
Today you can set an arbitrarily long timeout on the `LagDetector` but there's no facility to just completely turn it off. The usual values of `0` and `-1` that one might expect to do so are forbidden. There's no good reason for this any more, see e.g. #140434, so with this commit we adjust the setting to accept nonpositive timeouts and interpret them to mean that no lag detection should take place. Relates ES-10778
Today you can set an arbitrarily long timeout on the `LagDetector` but there's no facility to just completely turn it off. The usual values of `0` and `-1` that one might expect to do so are forbidden. There's no good reason for this any more, see e.g. elastic#140434, so with this commit we adjust the setting to accept nonpositive timeouts and interpret them to mean that no lag detection should take place. Relates ES-10778
Related to ES-10778: since we're running with a relaxed lag detector in
some environments now we really should be covering this in the test
suite to verify that the lag detector isn't required for liveness.
Relates #108690