Skip to content

Reduce follower cluster state lag timeout for disruption tests#108691

Merged
DiannaHohensee merged 3 commits intoelastic:mainfrom
DiannaHohensee:2024/05/15/fix-testAckedIndexing
May 15, 2024
Merged

Reduce follower cluster state lag timeout for disruption tests#108691
DiannaHohensee merged 3 commits intoelastic:mainfrom
DiannaHohensee:2024/05/15/fix-testAckedIndexing

Conversation

@DiannaHohensee
Copy link
Copy Markdown
Contributor

@DiannaHohensee DiannaHohensee commented May 15, 2024

It's possible for a node-left task to get interrupted prior to removing
the node from the master's list of faultyNodes. Nodes on the faultyNodes
list do not receive cluster state updates, and are eventually removed.

Subsequently, when the node attempts to rejoin, after test network
disruptions have ceased, the node-join request can succeed, but the
node will never receive the cluster state update, consider the node-join
a failure, and will resend node-join requests until the LagDetector
removes the node from the faultyNodes list.
#108690 will address the
node-join issue.

Closes #91447


Much belatedly circling back to this.

It's possible for a node-left task to get interrupted prior to removing
the node from the master's list of faultyNodes. Nodes on the faultyNodes
list do not receive cluster state updates, and are eventually removed.

Subsequently, when the node attempts to rejoin, after test network
disruptions have ceased, the node-join request can succeed, but the
node will never receive the cluster state update, consider the node-join
a failure, and will resend node-join requests until the LagDetector
removes the node from the faultyNodes list.
elastic#108690 will address the
node-join issue.
@DiannaHohensee DiannaHohensee added >test Issues or PRs that are addressing/adding tests :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed Meta label for distributed team. labels May 15, 2024
@DiannaHohensee DiannaHohensee self-assigned this May 15, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one suggestion

@DiannaHohensee DiannaHohensee merged commit 4700027 into elastic:main May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed Meta label for distributed team. >test Issues or PRs that are addressing/adding tests v8.15.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] ClusterDisruptionIT testAckedIndexing failing

3 participants