Advance checkpoints only after persisting ops#9309
Conversation
51bc1f1 to
b858f63
Compare
|
retest this please cannot reproduce locally |
mfussenegger
left a comment
There was a problem hiding this comment.
I think we should also port over LocalCheckpointTrackerTests
| public boolean exists() { | ||
| return exists; | ||
| } | ||
|
|
||
| public long version() { | ||
| return this.version; | ||
| } |
There was a problem hiding this comment.
Can we remove these extra properties again? See 836b1bb for why they were removed.
| translog = openTranslog(engineConfig, translogDeletionPolicy, engineConfig.getGlobalCheckpointSupplier(), | ||
| seqNo -> { | ||
| final LocalCheckpointTracker tracker = getLocalCheckpointTracker(); | ||
| assert tracker != null || getTranslog().isOpen() == false; |
There was a problem hiding this comment.
Could we add some assertion message here?
| SequenceNumbers.loadSeqNoInfoFromLuceneCommit(store.readLastCommittedSegmentsInfo().userData.entrySet()); | ||
| maxSeqNo = seqNoStats.maxSeqNo; | ||
| localCheckpoint = seqNoStats.localCheckpoint; | ||
| logger.trace("recovered maximum sequence number [{}] and local checkpoint [{}]", maxSeqNo, localCheckpoint); |
There was a problem hiding this comment.
maybe wrap in logger.isTraceEnabled
| localCheckpointTracker.markSeqNoAsProcessed(indexResult.getSeqNo()); | ||
| if (indexResult.getTranslogLocation() == null) { | ||
| // the op is coming from the translog (and is hence persisted already) or it does not have a sequence number | ||
| assert index.origin().isFromTranslog() || indexResult.getSeqNo() == SequenceNumbers.UNASSIGNED_SEQ_NO; |
There was a problem hiding this comment.
Same here, an assertion message would be nice
| localCheckpointTracker.markSeqNoAsProcessed(deleteResult.getSeqNo()); | ||
| if (deleteResult.getTranslogLocation() == null) { | ||
| // the op is coming from the translog (and is hence persisted already) or does not have a sequence number (version conflict) | ||
| assert delete.origin().isFromTranslog() || deleteResult.getSeqNo() == SequenceNumbers.UNASSIGNED_SEQ_NO; |
b858f63 to
7806c0c
Compare
|
retest this please I cannot reproduce locally.... |
|
retest this please seems flaky, I am investigating locally |
7806c0c to
02f7648
Compare
|
retest this please it is flaky, and I cannot reproduce it locally. Taking the one from ES requires some changes, some code to import too |
02f7648 to
eb6087d
Compare
| @Seed("EA3633FB2FE61702") | ||
| @Repeat(iterations = 1) |
There was a problem hiding this comment.
I guess this should be reverted now
| @Seed("99FCB15FF5D32AD0") | ||
| @Repeat(iterations = 1) |
Port of elastic/elasticsearch#43205 Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This leaves room for the history below the global checkpoint to still change in case of a crash. As we rely on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard copies / follower clusters going out of sync.
5759c0d to
71294dd
Compare
…VerifyShardBeforeCloseAction #9309 ports over elastic/elasticsearch#43205 but at that point TransportVerifyShardBeforeCloseAction was not present in our code base.
…ifyShardBeforeCloseAction #9309 ports over elastic/elasticsearch#43205 but at that point TransportVerifyShardBeforeCloseAction was not present in our code base.
…ifyShardBeforeCloseAction #9309 ports over elastic/elasticsearch#43205 but at that point TransportVerifyShardBeforeCloseAction was not present in our code base.
…ifyShardBeforeCloseAction #9309 ports over elastic/elasticsearch#43205 but at that point TransportVerifyShardBeforeCloseAction was not present in our code base.
…ifyShardBeforeCloseAction #9309 ports over elastic/elasticsearch#43205 but at that point TransportVerifyShardBeforeCloseAction was not present in our code base.
…ifyShardBeforeCloseAction #9309 ports over elastic/elasticsearch#43205 but at that point TransportVerifyShardBeforeCloseAction was not present in our code base.
Do-over
Checklist
CHANGES.txtCHANGES.txt(E.g. AdminUI)