Use global checkpoint as starting seq in ops-based recovery#43463
Conversation
|
Pinging @elastic/es-distributed |
|
This PR is still WIP but I opened this to get your feedback on the approach. |
henningandersen
left a comment
There was a problem hiding this comment.
Thanks @dnhatn , I left a few initial comments.
server/src/main/java/org/elasticsearch/index/shard/IndexShard.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java
Outdated
Show resolved
Hide resolved
ywelsch
left a comment
There was a problem hiding this comment.
Thanks for picking this up. I've left some preliminary comments.
server/src/main/java/org/elasticsearch/index/shard/IndexShard.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/recovery/StartRecoveryRequest.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java
Outdated
Show resolved
Hide resolved
|
Talked to Yannick on another channel, we preferred to make this change altogether with the peer recovery retention leases work. Therefore, this PR will go to the feature branch (peer-recovery-retention-leases). @henningandersen @ywelsch @DaveCTurner This is ready for another round. Can you please take another look? Thank you! |
|
@dnhatn there is a relevant test failure here I think |
This reverts commit 6ec4e80.
|
run elasticsearch-ci/packaging-sample |
|
@ywelsch @henningandersen @DaveCTurner Thank you for reviewing. Yannick, sorry for many iterations in this PR. I should have done better here. |
Today we use the local checkpoint of the safe commit on replicas as the starting sequence number of operation-based peer recovery. While this is a good choice due to its simplicity, we need to share this information between copies if we use retention leases in peer recovery. We can avoid this extra work if we use the global checkpoint as the starting sequence number. With this change, we will try to recover replica locally up to the global checkpoint before performing peer recovery. This commit should also increase the chance of operation-based recovery.
… step (#44781) If we force allocate an empty or stale primary, the global checkpoint on replicas might be higher than the primary's as the local recovery step (introduced in #43463) loads the previous (stale) global checkpoint into ReplicationTracker. There's no issue with the retention leases for a new lease with a higher term will supersede the stale one. Relates #43463
… step (#44781) If we force allocate an empty or stale primary, the global checkpoint on replicas might be higher than the primary's as the local recovery step (introduced in #43463) loads the previous (stale) global checkpoint into ReplicationTracker. There's no issue with the retention leases for a new lease with a higher term will supersede the stale one. Relates #43463
Previously, if the metadata snapshot is empty (either no commit found or error), we won't compute the starting sequence number and use -2 to opt out the operation-based recovery. With #43463, we have a starting sequence number before reading the last commit. Thus, we need to reset it if we fail to snapshot the store. Closes #45072
Previously, if the metadata snapshot is empty (either no commit found or error), we won't compute the starting sequence number and use -2 to opt out the operation-based recovery. With #43463, we have a starting sequence number before reading the last commit. Thus, we need to reset it if we fail to snapshot the store. Closes #45072
Today we use the local checkpoint of the safe commit on replicas to determine whether we can perform a sequence number based recovery. While this is a good choice due to its simplicity, it replies on flushing which should not happen frequently.
This change increases the chance of sequence number based recoveries by using the global checkpoint on the target as the starting sequence number when possible.