Allow engine to recover from translog upto a seqno#33032
Allow engine to recover from translog upto a seqno#33032dnhatn merged 2 commits intoelastic:masterfrom
Conversation
This change allows an engine to recover from its local translog up to the given seqno. The extended API can be used in these use cases: 1. When a replica starts following a new primary, it resets its index to the safe commit, then replays its local translog up to the current global checkpoint (see elastic#32867). 2. When a replica starts a peer-recovery, it can initialize the start_sequence_number to the persisted global checkpoint instead of the local checkpoint of the safe commit. A replica will then replay its local translog up to that global checkpoint before accepting remote translog from the primary. This change will increase the chance of operation-based recovery. I will make this in a follow-up. Relates elastic#32867
|
Pinging @elastic/es-distributed |
ywelsch
left a comment
There was a problem hiding this comment.
I've left a few comments
|
|
||
| /** | ||
| * Performs recovery from the transaction log. | ||
| * Performs recovery from the transaction log up to {@code recoverUpToSeqNo}. |
There was a problem hiding this comment.
can you add that it's an inclusive bound? (i.e. up to XYZ included)
| } | ||
|
|
||
| public Snapshot newSnapshotFromGen(long minGeneration) throws IOException { | ||
| public Snapshot newSnapshotFromGen(TranslogGeneration fromGeneration, long upToSeqNo) throws IOException { |
There was a problem hiding this comment.
why did you change this to take a TranslogGeneration with the uuid instead of just the long minGeneration? It's not using that uuid anywhere here AFAICS.
There was a problem hiding this comment.
This method might be interpreted as a range of translog generations or a range of sequence numbers if the parameter is a tuple of Longs. I changed to TranslogGeneration to avoid this issue. I will revert this change if you don't like it.
There was a problem hiding this comment.
I think it's good though. less likely to misuse.
| return new Snapshot() { | ||
| int skippedOps = 0; | ||
| @Override | ||
| public int totalOperations() { |
There was a problem hiding this comment.
also override overriddenOperations and delegate to snapshot.overriddenOperations?
| if (upToSeqNo == Long.MAX_VALUE) { | ||
| return snapshot; | ||
| } | ||
| return new Snapshot() { |
There was a problem hiding this comment.
I think we should have a proper (top-level) class for this, supporting both min and max. min would be useful for newSnapshotFromMinSeqNo (see e.g. PrimaryReplicaResyncer, which still has to filter based on min = startingSeqNo, all of which could be accomplished through the Snapshot), and max would be useful for this one here (where it might also have a min).
We might even make the interface of this newSnapshot method purely sequence-number-based, where you can specify the range of operations to recover instead of the translog generation. That last part is not something I would change right away, but maybe something to look into later.
There was a problem hiding this comment.
I added filter(Predicate<Operation>) method to the Snapshot for this purpose. However, I feel it's too broad; then I go with the filter class that you suggested.
|
@ywelsch I've addressed your comments. Would you please have another look? Thank you! |
This change allows an engine to recover from its local translog up to the given seqno. The extended API can be used in these use cases: When a replica starts following a new primary, it resets its index to the safe commit, then replays its local translog up to the current global checkpoint (see #32867). When a replica starts a peer-recovery, it can initialize the start_sequence_number to the persisted global checkpoint instead of the local checkpoint of the safe commit. A replica will then replay its local translog up to that global checkpoint before accepting remote translog from the primary. This change will increase the chance of operation-based recovery. I will make this in a follow-up. Relates #32867
* 6.x: Allow engine to recover from translog upto a seqno (#33032) TEST: Skip assertSeqNos for closed shards (#33130) TEST: resync operation on replica should acquire shard permit (#33103) Add proxy support to RemoteClusterConnection (#33062) Build: Line up IDE detection logic Security index expands to a single replica (#33131) Suppress more tests HLRC: request/response homogeneity and JavaDoc improvements (#33133) [Rollup] Move toAggCap() methods out of rollup config objects (#32583) Muted all these tests due to #33128 Fix race condition in scheduler engine test
This change allows an engine to recover from its local translog up to
the given seqno. The extended API can be used in these use cases:
When a replica starts following a new primary, it resets its index to
the safe commit, then replays its local translog up to the current
global checkpoint (see Reset replica engine before primary-replica resync #32867).
When a replica starts a peer-recovery, it can initialize the
start_sequence_number to the persisted global checkpoint instead of the
local checkpoint of the safe commit. A replica will then replay its
local translog up to that global checkpoint before accepting remote
translog from the primary. This change will increase the chance of
operation-based recovery. I will make this in a follow-up.
Relates #32867
/cc @bleskes