Sometimes relax lag detector in `CoordinatorTests` by DaveCTurner · Pull Request #140434 · elastic/elasticsearch

DaveCTurner · 2026-01-09T09:45:01Z

Related to ES-10778: since we're running with a relaxed lag detector in
some environments now we really should be covering this in the test
suite to verify that the lag detector isn't required for liveness.

Relates #108690

Related to ES-10778: since we're running with a relaxed lag detector in some environments now we really should be covering this in the test suite to verify that the lag detector isn't required for liveness. Relates elastic#108690

elasticsearchmachine · 2026-01-09T09:45:28Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

DaveCTurner · 2026-01-09T09:46:47Z

server/src/test/java/org/elasticsearch/cluster/coordination/CoordinatorTests.java

            assertThat("leader should be last to ack", ackCollector.getSuccessfulAckIndex(leader), equalTo(1));

            follower0.setClusterStateApplyResponse(ClusterStateApplyResponse.SUCCEED);
+            leader.submitValue(randomLong()); // follower0 acks next value allowing cluster to stabilise


Without the lag detector and absent any further cluster state updates the lagging node remains lagging, but that's what we want. The point here is that if another cluster state update happens then it stops lagging again.

DaveCTurner · 2026-01-09T09:48:53Z

server/src/test/java/org/elasticsearch/cluster/coordination/CoordinatorTests.java

+            cluster.runFor(DEFAULT_STABILISATION_TIME, "allowing new leader election");
+            cluster.getAnyLeader().submitValue(randomLong()); // old leader acks this value allowing cluster to stabilise
+            cluster.stabilise(DEFAULT_CLUSTER_STATE_UPDATE_DELAY);


Likewise here: stabilising the cluster will not do anything to the lagging node, we need to wait to allow for a master election, then do another cluster state update, to bring the node back in sync.

So if there is a relaxed lag detector, we need to wait for the cluster to stabilise again. If there is not a relaxed lag detector, would we be in this state already?

DEFAULT_CLUSTER_STATE_UPDATE_DELAY is short, just a few round-trips, so this doesn't add much more time.

In practice DEFAULT_STABILISATION_TIME is long enough for the lag detector to do its thing when it's not in relaxed mode.

DaveCTurner · 2026-01-09T09:49:23Z

server/src/test/java/org/elasticsearch/cluster/coordination/CoordinatorTests.java

+                    .put(
+                        LagDetector.CLUSTER_FOLLOWER_LAG_TIMEOUT_SETTING.getKey(),
+                        TimeValue.timeValueMillis(defaultMillis(LagDetector.CLUSTER_FOLLOWER_LAG_TIMEOUT_SETTING))
+                    )


This test is the only one that actually needs the lag detector, because we are asserting that it logs what it logs.

DaveCTurner · 2026-01-09T09:53:06Z

server/src/test/java/org/elasticsearch/cluster/coordination/CoordinatorTests.java

-                    defaultMillis(PUBLISH_TIMEOUT_SETTING) + 2 * DEFAULT_DELAY_VARIABILITY + defaultMillis(
-                        LagDetector.CLUSTER_FOLLOWER_LAG_TIMEOUT_SETTING
-                    ) + DEFAULT_DELAY_VARIABILITY + 2 * DEFAULT_DELAY_VARIABILITY,
+                    DEFAULT_CLUSTER_STATE_UPDATE_DELAY + defaultMillis(PUBLISH_TIMEOUT_SETTING) + 2 * DEFAULT_DELAY_VARIABILITY


Timeout here apparently didn't include enough time to do the publication.

…ctor

pxsalehi

LGTM

…ctor

joshua-adams-1

LGTM

This test is kinda bogus with an atomic register because it doesn't actually time out as claimed. Nonetheless, we do want to know that the nacks are genuinely delivered eventually under these conditions. This commit adjusts the test to handle the relaxed a lag detector introduced in elastic#140434.

This test is kinda bogus with an atomic register because it doesn't actually time out as claimed. Nonetheless, we do want to know that the nacks are genuinely delivered eventually under these conditions. This commit adjusts the test to handle the relaxed a lag detector introduced in elastic#140434. Closes elastic#140509

Related to ES-10778: since we're running with a relaxed lag detector in some environments now we really should be covering this in the test suite to verify that the lag detector isn't required for liveness. Relates elastic#108690

This test is kinda bogus with an atomic register because it doesn't actually time out as claimed. Nonetheless, we do want to know that the nacks are genuinely delivered eventually under these conditions. This commit adjusts the test to handle the relaxed lag detector introduced in #140434. Closes #140509

This test is kinda bogus with an atomic register because it doesn't actually time out as claimed. Nonetheless, we do want to know that the nacks are genuinely delivered eventually under these conditions. This commit adjusts the test to handle the relaxed lag detector introduced in elastic#140434. Closes elastic#140509

Today you can set an arbitrarily long timeout on the `LagDetector` but there's no facility to just completely turn it off. The usual values of `0` and `-1` that one might expect to do so are forbidden. There's no good reason for this any more, see e.g. elastic#140434, so with this commit we adjust the setting to accept nonpositive timeouts and interpret them to mean that no lag detection should take place. Relates ES-10778

Today you can set an arbitrarily long timeout on the `LagDetector` but there's no facility to just completely turn it off. The usual values of `0` and `-1` that one might expect to do so are forbidden. There's no good reason for this any more, see e.g. #140434, so with this commit we adjust the setting to accept nonpositive timeouts and interpret them to mean that no lag detection should take place. Relates ES-10778

Today you can set an arbitrarily long timeout on the `LagDetector` but there's no facility to just completely turn it off. The usual values of `0` and `-1` that one might expect to do so are forbidden. There's no good reason for this any more, see e.g. elastic#140434, so with this commit we adjust the setting to accept nonpositive timeouts and interpret them to mean that no lag detection should take place. Relates ES-10778

Sometimes relax lag detector in CoordinatorTests

d82bcde

Related to ES-10778: since we're running with a relaxed lag detector in some environments now we really should be covering this in the test suite to verify that the lag detector isn't required for liveness. Relates elastic#108690

DaveCTurner requested review from joshua-adams-1 and pxsalehi January 9, 2026 09:45

DaveCTurner added >test Issues or PRs that are addressing/adding tests :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v9.4.0 labels Jan 9, 2026

elasticsearchmachine added the Team:Distributed Coordination (obsolete) Meta label for Distributed Coordination team. Obsolete. Please do not use. label Jan 9, 2026

DaveCTurner commented Jan 9, 2026

View reviewed changes

Merge branch 'main' into 2026/01/09/CoordinatorTests-relaxed-lag-dete…

a863845

…ctor

pxsalehi approved these changes Jan 12, 2026

View reviewed changes

DaveCTurner enabled auto-merge (squash) January 12, 2026 10:47

Merge branch 'main' into 2026/01/09/CoordinatorTests-relaxed-lag-dete…

6014ca2

…ctor

joshua-adams-1 approved these changes Jan 12, 2026

View reviewed changes

DaveCTurner merged commit d5b2a4e into elastic:main Jan 12, 2026
35 checks passed

DaveCTurner mentioned this pull request Jan 12, 2026

Fix testAckListenerReceivesNacksIfPublicationTimesOut #140514

Merged

DaveCTurner mentioned this pull request Jan 26, 2026

Allow turning LagDetector off #141280

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sometimes relax lag detector in `CoordinatorTests`#140434

Sometimes relax lag detector in `CoordinatorTests`#140434
DaveCTurner merged 3 commits intoelastic:mainfrom
DaveCTurner:2026/01/09/CoordinatorTests-relaxed-lag-detector

DaveCTurner commented Jan 9, 2026

Uh oh!

elasticsearchmachine commented Jan 9, 2026

Uh oh!

DaveCTurner Jan 9, 2026

Uh oh!

DaveCTurner Jan 9, 2026

Uh oh!

joshua-adams-1 Jan 9, 2026

Uh oh!

DaveCTurner Jan 9, 2026

Uh oh!

DaveCTurner Jan 9, 2026

Uh oh!

DaveCTurner Jan 9, 2026

Uh oh!

pxsalehi left a comment

Uh oh!

joshua-adams-1 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

DaveCTurner commented Jan 9, 2026

Uh oh!

elasticsearchmachine commented Jan 9, 2026

Uh oh!

DaveCTurner Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

joshua-adams-1 Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

pxsalehi left a comment

Choose a reason for hiding this comment

Uh oh!

joshua-adams-1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants