Seq Number based recovery should validate last lucene commit max seq##22851
Conversation
…q_no_should_be_below_global_check_point
The seq# base recovery logic relies on rolling back lucene to remove any operations above the global checkpoint. This part of the plan is not implemented yet but have to have these guarantees. Instead we should make the seq# logic validate that the last commit point (and the only one we have) maintains the invariant and if not, fall back to file based recovery. This commit adds a test that creates situation where rollback is needed (primary fail over with ops in flight) and fixes another issue that was surfaced by it - if a primary can't serve a seq# based recovery request and does a file copy, it still used the incoming `startSeqNo` as a filter. Relates to elastic#22484 & #elastic#10708
|
test this please |
|
Test this please |
ywelsch
left a comment
There was a problem hiding this comment.
Left some minor comments but change LGTM
| return recoveryTarget.store().loadSeqNoStats(globalCheckpoint).getLocalCheckpoint() + 1; | ||
| final SeqNoStats seqNoStats = recoveryTarget.store().loadSeqNoStats(globalCheckpoint); | ||
| if (seqNoStats.getMaxSeqNo() <= seqNoStats.getGlobalCheckpoint()) { | ||
| assert seqNoStats.getLocalCheckpoint() <= seqNoStats.getGlobalCheckpoint() : |
There was a problem hiding this comment.
this would be a consequence of maxSeqNo >= localCheckpoint. Should we instead add the assertion maxSeqNo >= localCheckpoint to the constructor of SeqNoStats?
| final long globalCheckpoint = Translog.readGlobalCheckpoint(recoveryTarget.indexShard().shardPath().resolveTranslog()); | ||
| return recoveryTarget.store().loadSeqNoStats(globalCheckpoint).getLocalCheckpoint() + 1; | ||
| final SeqNoStats seqNoStats = recoveryTarget.store().loadSeqNoStats(globalCheckpoint); | ||
| if (seqNoStats.getMaxSeqNo() <= seqNoStats.getGlobalCheckpoint()) { |
There was a problem hiding this comment.
can you add a comment here why we check this here (i.e. essentially the first paragraph of the PR description).
| this.logger = logger; | ||
| this.indexName = this.request.shardId().getIndex().getName(); | ||
| this.shardId = this.request.shardId().id(); | ||
| this.logger = Loggers.getLogger(getClass(), nodeSettings, request.shardId(),"recover to " + request.targetNode().getName()); |
| .setSettings(Settings.builder() | ||
| .put(IndexMetaData.SETTING_NUMBER_OF_SHARDS, 1 + randomInt(2)) | ||
| .put(IndexMetaData.SETTING_NUMBER_OF_REPLICAS, randomInt(2)) | ||
| .put(IndexSettings.INDEX_SEQ_NO_CHECKPOINT_SYNC_INTERVAL.getKey(), "200ms") |
There was a problem hiding this comment.
randomize this a bit? Maybe we won't uncover other issues if this is too low?
There was a problem hiding this comment.
I added randomization between 5s and 200ms
| } | ||
|
|
||
| public static ShardRouting promoteToPrimary(ShardRouting routing) { | ||
| return new ShardRouting(routing.shardId(), routing.currentNodeId(), routing.relocatingNodeId(), true, routing.state(), |
There was a problem hiding this comment.
why not use ShardRouting.moveActiveReplicaToPrimary?
There was a problem hiding this comment.
Because I didn't find it. Thanks for the tip.
| // simulate docs that were inflight when primary failed, these will be rolled back | ||
| final int rollbackDocs = randomIntBetween(1, 5); | ||
| logger.info("--> indexing {} rollback docs", rollbackDocs); | ||
| for (int i = 0; i< rollbackDocs; i++) { |
| replica.updateGlobalCheckpointOnReplica(maxSeqNo - 1); | ||
| replica.getTranslog().sync(); | ||
|
|
||
| // commit is enough, global checkpoint is bellow max *committed* which is NO_OPS_PERFORMED |
|
|
||
| replica.updateGlobalCheckpointOnReplica(maxSeqNo); | ||
| replica.getTranslog().sync(); | ||
| // commit is enough, global checkpoint is bellow max |
…q_no_should_be_below_global_check_point
|
thx @ywelsch |
The seq# base recovery logic relies on rolling back lucene to remove any operations above the global checkpoint. This part of the plan is not implemented yet but have to have these guarantees. Instead we should make the seq# logic validate that the last commit point (and the only one we have) maintains the invariant and if not, fall back to file based recovery.
This commit adds a test that creates situation where rollback is needed (primary failover with ops in flight) and fixes another issue that was surfaced by it - if a primary can't serve a seq# based recovery request and does a file copy, it still used the incoming
startSeqNoas a filter.Relates to #22484 & #10708