Create missing PRRLs after primary activation#44009
Conversation
Today peer recovery retention leases (PRRLs) are created when starting a replication group from scratch and during peer recovery. However, if the replication group was migrated from nodes running a version which does not create PRRLs (e.g. 7.3 and earlier) then it's possible that the primary was relocated or promoted without first establishing all the expected leases. It's not possible to establish these leases before or during primary activation, so we must create them as soon as possible afterwards. This gives weaker guarantees about history retention, since there's a possibility that history will be discarded before it can be used. In practice such situations are expected to occur only rarely. This commit adds the machinery to create missing leases after primary activation, and strengthens the assertions about the existence of such leases in order to ensure that once all the leases do exist we never again enter a state where there's a missing lease. Relates elastic#41536
|
Pinging @elastic/es-distributed |
|
Note that this PR is against the |
|
@elasticmachine please run elasticsearch-ci/2 I opened #44011 as the failure seems unrelated and occurs elsewhere. |
server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java
Show resolved
Hide resolved
| case OLD: | ||
| Settings.Builder settings = Settings.builder() | ||
| .put(IndexMetaData.INDEX_NUMBER_OF_SHARDS_SETTING.getKey(), between(1, 5)) | ||
| .put(IndexMetaData.INDEX_NUMBER_OF_REPLICAS_SETTING.getKey(), 1) |
There was a problem hiding this comment.
perhaps randomly 0 or 1 replica?
Bah this is incorrect. We know that every 8.0 peer will have a lease, but we can't be certain that every 7.x peer has one. Therefore this PR will need forward-porting to |
dnhatn
left a comment
There was a problem hiding this comment.
I left a small ask but LGTM. Thanks @DaveCTurner.
| return false; | ||
| } | ||
|
|
||
| public void testCanRecoverFromStoreWithoutPeerRecoveryRetentionLease() throws Exception { |
There was a problem hiding this comment.
Can we add a full cluster restart test with 0-2 replicas then verify that after the cluster is upgraded, every copy has PRRL installed properly.
…ls-bwc-create-after-primary-activation
Today peer recovery retention leases (PRRLs) are created when starting a replication group from scratch and during peer recovery. However, if the replication group was migrated from nodes running a version which does not create PRRLs (e.g. 7.3 and earlier) then it's possible that the primary was relocated or promoted without first establishing all the expected leases. It's not possible to establish these leases before or during primary activation, so we must create them as soon as possible afterwards. This gives weaker guarantees about history retention, since there's a possibility that history will be discarded before it can be used. In practice such situations are expected to occur only rarely. This commit adds the machinery to create missing leases after primary activation, and strengthens the assertions about the existence of such leases in order to ensure that once all the leases do exist we never again enter a state where there's a missing lease. Relates #41536
Today peer recovery retention leases (PRRLs) are created when starting a
replication group from scratch and during peer recovery. However, if the
replication group was migrated from nodes running a version which does not
create PRRLs (e.g. 7.3 and earlier) then it's possible that the primary was
relocated or promoted without first establishing all the expected leases.
It's not possible to establish these leases before or during primary
activation, so we must create them as soon as possible afterwards. This gives
weaker guarantees about history retention, since there's a possibility that
history will be discarded before it can be used. In practice such situations
are expected to occur only rarely.
This commit adds the machinery to create missing leases after primary
activation, and strengthens the assertions about the existence of such leases
in order to ensure that once all the leases do exist we never again enter a
state where there's a missing lease.
Relates #41536