Peer recovery should flush at the end#41660
Conversation
|
Pinging @elastic/es-distributed |
| // if all those uncommitted operations have baked into the existing Lucene index commit already. | ||
| final SequenceNumbers.CommitInfo commitInfo = SequenceNumbers.loadSeqNoInfoFromLuceneCommit( | ||
| indexShard.commitStats().getUserData().entrySet()); | ||
| return commitInfo.maxSeqNo != commitInfo.localCheckpoint |
There was a problem hiding this comment.
I wonder if the condition above about the translog is sufficient. What situation is the condition here addressing that's not addressed by the above one?
There was a problem hiding this comment.
If a file-based occurs, the primary also sends its translog to replica. These operations are uncommitted on the replica even though they are baked into the commit already. We need this condition to avoid flushing in this case to keep the syncId. I pushed 07c3a7c to use another check.
|
@elasticmachine test this please |
henningandersen
left a comment
There was a problem hiding this comment.
I think this could solve the issue and have other benefits as described.
But I am a bit worried about the implications, especially for future maintenance. If we ever add anything into VerifyShardBeforeClose, we need to also ensure the same holds at the end of a recovery. Also, I am not 100% sure recovery is the only place to ensure this (though I have no concrete cases).
I would find it more intuitive to (maybe in addition to this) add a check in MetaDataIndexStateService.closeRoutingTable to fail closing the index if the routing table contains unvalidated shard copies (meaning we would have to collect more info in the previous steps).
|
Good point @henningandersen, but failing the closing operation would also not be very user-friendly, as shards are free to move around based on rebalancing decisions. Let's consider more options here. |
|
@henningandersen found that we can always validate max_seq_no equals to the global checkpoint in ReadOnlyEngine with this change. I pushed 6e952c5 to enable it. |
I tend to think I was wrong about this, since FrozenEngine extends ReadOnlyEngine. If something was frozen on 6.7 or 7.0, it might not obey the invariant if they have #41041 ? |
|
Thanks everyone! |
Flushing at the end of a peer recovery (if needed) can bring these benefits: 1. Closing an index won't end up with the red state for a recovering replica should always be ready for closing whether it performs the verifying-before-close step or not. 2. Good opportunities to compact store (i.e., flushing and merging Lucene, and trimming translog) Closes #40024 Closes #39588
Flushing at the end of a peer recovery (if needed) can bring these benefits: 1. Closing an index won't end up with the red state for a recovering replica should always be ready for closing whether it performs the verifying-before-close step or not. 2. Good opportunities to compact store (i.e., flushing and merging Lucene, and trimming translog) Closes elastic#40024 Closes elastic#39588
Flushing at the end of a peer recovery (if needed) can bring these benefits:
Closes #40024
Closes #39588
Relates #33888