Fill missing sequence IDs up to max sequence ID when recovering from store#24238
Fill missing sequence IDs up to max sequence ID when recovering from store#24238s1monw merged 5 commits intoelastic:masterfrom
Conversation
…store Today we might promote a primary and recover from store where after translog recovery the local checkpoint is still behind the maximum sequence ID seen. To fill the holes in the sequence ID history this PR adds a utility method that fills up all missing sequence IDs up to the maximum seen sequence ID with no-ops.
bleskes
left a comment
There was a problem hiding this comment.
Looks great. I one minor request to the test.
| final long maxSeqId = seqNoService.getMaxSeqNo(); | ||
| int numNoOpsAdded = 0; | ||
| for (long i = localCheckpoint + 1; i <= maxSeqId; | ||
| // the local checkpoint might have been advanced so we are leap-frogging |
There was a problem hiding this comment.
the local checkpoint must have advanced by at least one. We can assert on that after the noop was indexed.
| Engine.Index primaryResponse = indexForDoc(doc); | ||
| Engine.IndexResult indexResult = engine.index(primaryResponse); | ||
| if (randomBoolean()) { | ||
| doc.updateSeqID(indexResult.getSeqNo(), 1); |
There was a problem hiding this comment.
why is this needed? doesn't the engine take care of that?
| assertEquals((maxSeqIDOnReplica+1) - numDocsOnReplica, recoveringEngine.fillSequenceNumberHistory(2)); | ||
| assertEquals(maxSeqIDOnReplica, recoveringEngine.seqNoService().getMaxSeqNo()); | ||
| assertEquals(maxSeqIDOnReplica, recoveringEngine.seqNoService().getLocalCheckpoint()); | ||
| if ((flushed = randomBoolean())) { |
There was a problem hiding this comment.
can we snapshot the translog and assert that the noops have the right primary term?
There was a problem hiding this comment.
ah I had that but remvoed it... good catch...
bleskes
left a comment
There was a problem hiding this comment.
Still LGTM. Left a suggestion for the new test.
| // start a replica shard and index the second doc | ||
| final IndexShard otherShard = newStartedShard(false); | ||
| test = otherShard.prepareIndexOnReplica( | ||
| SourceToParse.source(SourceToParse.Origin.PRIMARY, shard.shardId().getIndexName(), test.type(), test.id(), test.source(), |
|
|
||
| /* This test just verifies that we fill up local checkpoint up to max seen seqID on primary recovery */ | ||
| public void testRecoverFromStoreWithNoOps() throws IOException { | ||
| final IndexShard shard = newStartedShard(true); |
There was a problem hiding this comment.
I think we can introduce a variant of indexDoc called indexDocOnReplica which takes a seq# as a parameter. This will remove the need for the extra shard. wdyt?
There was a problem hiding this comment.
I can do that in a sep PR
…store (elastic#24238) Today we might promote a primary and recover from store where after translog recovery the local checkpoint is still behind the maximum sequence ID seen. To fill the holes in the sequence ID history this PR adds a utility method that fills up all missing sequence IDs up to the maximum seen sequence ID with no-ops. Relates to elastic#10708
Today we might promote a primary and recover from store where after translog
recovery the local checkpoint is still behind the maximum sequence ID seen.
To fill the holes in the sequence ID history this PR adds a utility method
that fills up all missing sequence IDs up to the maximum seen sequence ID
with no-ops.
Relates to #10708
I still work on a test for store recovery to ensure it's called but I think it's ready for review.