Skip to content

Preserve last-committed data at accept time#92258

Closed
DaveCTurner wants to merge 1 commit intoelastic:mainfrom
DaveCTurner:2022-12-09-preserve-last-committed-data
Closed

Preserve last-committed data at accept time#92258
DaveCTurner wants to merge 1 commit intoelastic:mainfrom
DaveCTurner:2022-12-09-preserve-last-committed-data

Conversation

@DaveCTurner
Copy link
Copy Markdown
Member

The cluster coordination consistency layer relies on a couple of fields within Metadata which record the last committed values on each node. In contrast, the rest of the cluster state can only be changed at accept time.

In the past we would copy these fields over from the master on every publication, but since #90101 we don't copy anything at all if the Metadata is unchanged on the master. However, the master computes the diff against the last committed state whereas the receiving nodes apply the diff to the last accepted state, and this means if the master sends a no-op Metadata diff then the receiving node will revert its last-committed values to the ones included in the state it last accepted.

With this commit we adjust CoordinationState to ignore changes to the last-committed fields at accept time.

The cluster coordination consistency layer relies on a couple of fields
within `Metadata` which record the last _committed_ values on each node.
In contrast, the rest of the cluster state can only be changed at
_accept_ time.

In the past we would copy these fields over from the master on every
publication, but since elastic#90101 we don't copy anything at all if the
`Metadata` is unchanged on the master. However, the master computes the
diff against the last _committed_ state whereas the receiving nodes
apply the diff to the last _accepted_ state, and this means if the
master sends a no-op `Metadata` diff then the receiving node will revert
its last-committed values to the ones included in the state it last
accepted.

With this commit we adjust `CoordinationState` to ignore changes to the
last-committed fields at accept time.
@DaveCTurner DaveCTurner added >bug WIP :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.7.0 labels Dec 9, 2022
@DaveCTurner
Copy link
Copy Markdown
Member Author

I'm not convinced this is the right fix. I mean it makes sense on the face of it but it introduces a deviation from this line of the formal model so we need to re-check everything.

An alternative fix is to go back to always copying these two fields over on every publication. That would be safer, but it involves changes in areas unrelated to cluster coordination safety so I worry about the risk of future bugs.

@DaveCTurner
Copy link
Copy Markdown
Member Author

Just to add that I think this bug is more of a liveness concern than a safety one: it means that an election may incorrectly wait to collect votes from the previous configuration as well as the current one.

@DaveCTurner
Copy link
Copy Markdown
Member Author

Closing this in favour of #92259

@DaveCTurner DaveCTurner deleted the 2022-12-09-preserve-last-committed-data branch December 11, 2022 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.7.0 WIP

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant