Include last-committed data in publication#92259
Conversation
|
Alternative to #92258 that doesn't involve changing the safety algorithm. Not as bad as I thought it would be. Repeating here: I think this bug is more of a liveness concern than a safety one: it means that an election may incorrectly wait to collect votes from the previous configuration as well as the current one.
|
The cluster coordination consistency layer relies on a couple of fields within `Metadata` which record the last _committed_ values on each node. In contrast, the rest of the cluster state can only be changed at _accept_ time. In the past we would copy these fields over from the master on every publication, but since elastic#90101 we don't copy anything at all if the `Metadata` is unchanged on the master. However, the master computes the diff against the last _committed_ state whereas the receiving nodes apply the diff to the last _accepted_ state, and this means if the master sends a no-op `Metadata` diff then the receiving node will revert its last-committed values to the ones included in the state it last accepted. With this commit we include the last-committed values alongside the cluster state diff so that they are always copied properly.
173d526 to
ad1574f
Compare
|
Hi @DaveCTurner, I've created a changelog YAML for you. |
|
Pinging @elastic/es-distributed (Team:Distributed) |
|
We should backport this to 8.6, but I'm unsure if it'll land in 8.6.0 or 8.6.1. It's not serious enough to call it a blocker for 8.6.0 IMO. However the 8.6.1 version constant doesn't exist yet so I am going to have to set the BwC version to 8.6.0 while backporting, and then we may have to adjust this when bumping versions after the 8.6.0 release. cc @henningandersen who may have a different opinion about calling this a blocker for 8.6.0. |
|
Hi @DaveCTurner, I've updated the changelog YAML for you. |
henningandersen
left a comment
There was a problem hiding this comment.
Left a few comments, probably to be considered questions.
| final var hasClusterUuid = lastAcceptedState.metadata().clusterUUID().equals(Metadata.UNKNOWN_CLUSTER_UUID) == false; | ||
| assert hasClusterUuid : "received cluster state with empty cluster uuid: " + lastAcceptedState; | ||
|
|
||
| if (hasClusterUuid && lastAcceptedState.metadata().clusterUUIDCommitted() == false) { |
There was a problem hiding this comment.
Feels odd to protect the logging by something we assert on above?
| if (hasClusterUuid && lastAcceptedState.metadata().clusterUUIDCommitted() == false) { | |
| if (lastAcceptedState.metadata().clusterUUIDCommitted() == false) { |
There was a problem hiding this comment.
Ehh I just kept the same logic as before but I don't think I've ever seen this assertion trip. I'd be ok to drop it.
| setLastAcceptedState(ClusterState.builder(lastAcceptedState).metadata(metadataBuilder).build()); | ||
|
|
||
| final var adjustedMetadata = lastAcceptedState.metadata() | ||
| .withLastCommittedValues(hasClusterUuid, lastAcceptedState.getLastAcceptedConfiguration()); |
There was a problem hiding this comment.
Should we pass in hasClusterUuid || lastAcceptedState.metadata().clusterUUIDCommitted() here? I mean, we are outside assertions already, so could also pass in true. But assuming assertions have some case we are trying to protect against, passing in the original committed value seems safer and also in line with original code.
| * marked as committed. | ||
| */ | ||
| default void markLastAcceptedStateAsCommitted() { | ||
| final ClusterState lastAcceptedState = getLastAcceptedState(); |
There was a problem hiding this comment.
I think the code is cleaner, but I failed to spot the important difference, is it just the logging you wanted out of this?
There was a problem hiding this comment.
I don't think this does anything different, there's just no need to use a Metadata.Builder here any more.
There was a problem hiding this comment.
Thanks. In that case, let us revert this and move this to a follow-up, where we can also assume that assertions are held (i.e., use true instead of hasClusterUuid later in the code).
| TransportRequestOptions.Type.STATE | ||
| ); | ||
|
|
||
| public static final Version INCLUDES_LAST_COMMITTED_DATA_VERSION = Version.V_8_7_0; |
There was a problem hiding this comment.
@DaveCTurner should this be 8.6.0 since it's intended to be backported?
There was a problem hiding this comment.
I'll adjust that while backporting. If I set this to 8.6.0 now then the BwC test suite would fail on this PR, but it's important for those tests to pass because they tell us whether there are any other more subtle BwC problems here.
There was a problem hiding this comment.
Joining the dots here:
- I opened a PR at Include last-committed data in publication #92277 which backports this to 8.6 with a fixed BwC version, and waited for it to pass CI.
- I merged 296eacd which disables the BwC tests in
main. - I merged the backport PR.
- I opened another PR at Fixup BWC after backport of #92259 #92280 to fix up the BwC version in
mainand reinstate the BwC tests, which will be merged once it passes CI.
|
I reverted the tidy-up to |
...r/src/test/java/org/elasticsearch/cluster/coordination/PublicationTransportHandlerTests.java
Outdated
Show resolved
Hide resolved
The cluster coordination consistency layer relies on a couple of fields within `Metadata` which record the last _committed_ values on each node. In contrast, the rest of the cluster state can only be changed at _accept_ time. In the past we would copy these fields over from the master on every publication, but since elastic#90101 we don't copy anything at all if the `Metadata` is unchanged on the master. However, the master computes the diff against the last _committed_ state whereas the receiving nodes apply the diff to the last _accepted_ state, and this means if the master sends a no-op `Metadata` diff then the receiving node will revert its last-committed values to the ones included in the state it last accepted. With this commit we include the last-committed values alongside the cluster state diff so that they are always copied properly. Closes elastic#90158 Backport of elastic#92259 to 8.6
The cluster coordination consistency layer relies on a couple of fields within `Metadata` which record the last _committed_ values on each node. In contrast, the rest of the cluster state can only be changed at _accept_ time. In the past we would copy these fields over from the master on every publication, but since #90101 we don't copy anything at all if the `Metadata` is unchanged on the master. However, the master computes the diff against the last _committed_ state whereas the receiving nodes apply the diff to the last _accepted_ state, and this means if the master sends a no-op `Metadata` diff then the receiving node will revert its last-committed values to the ones included in the state it last accepted. With this commit we include the last-committed values alongside the cluster state diff so that they are always copied properly. Closes #90158 Backport of #92259 to 8.6
Adjusts the BWC version for the wire protocol and re-enables the BWC tests.
Extracted from elastic#92259
The cluster coordination consistency layer relies on a couple of fields within
Metadatawhich record the last committed values on each node. In contrast, the rest of the cluster state can only be changed at accept time.In the past we would copy these fields over from the master on every publication, but since #90101 we don't copy anything at all if the
Metadatais unchanged on the master. However, the master computes the diff against the last committed state whereas the receiving nodes apply the diff to the last accepted state, and this means if the master sends a no-opMetadatadiff then the receiving node will revert its last-committed values to the ones included in the state it last accepted.With this commit we include the last-committed values alongside the cluster state diff so that they are always copied properly.
Closes #90158