Non-peer recovery should set the global checkpoint#27965
Non-peer recovery should set the global checkpoint#27965bleskes merged 6 commits intoelastic:masterfrom
Conversation
ywelsch
left a comment
There was a problem hiding this comment.
I've left some questions/suggestions
| * Loads the maximum sequence number and local checkpoint from the latest Lucene commit point. | ||
| * | ||
| * @return a tuple populated with the maximum sequence number and the local checkpoint | ||
| * @return {@link org.elasticsearch.index.seqno.SequenceNumbers.CommitInfo} containing information about the last commit |
There was a problem hiding this comment.
no need for full path here, just {@link SequenceNumbers.CommitInfo}
| * @return the sequence number stats | ||
| */ | ||
| public static Tuple<Long, Long> loadSeqNoInfoFromLuceneCommit( | ||
| public static CommitInfo loadSeqNoInfoFromLuceneCommit( |
There was a problem hiding this comment.
yeah.. i almost got it wrong :)
| final RecoveryState.Translog translogStats = recoveryState().getTranslog(); | ||
| translogStats.totalOperations(0); | ||
| translogStats.totalOperationsOnStart(0); | ||
| globalCheckpointTracker.updateGlobalCheckpointOnReplica(SequenceNumbers.NO_OPS_PERFORMED, "index created"); |
There was a problem hiding this comment.
I think a nicer approach (can be a follow-up done by me) would be not to call updateGlobalCheckpointOnReplica here, but instead call
globalCheckpointTracker.activatePrimaryMode(SequenceNumbers.NO_OPS_PERFORMED);
either here or in the IndexShard constructor (where we create the GlobalCheckpointTracker) when the recovery source is EMPTY_STORE.
There was a problem hiding this comment.
yeah, I thought we talked about it. I would prefer to unify on when we activate the primary shard - on post recovery or on shard started. The activation logic is the same for local_shards and snapshot. For local_store we need the in sync set to be updated, which may be done as part of the constructor. I personally prefer a shard to be full ready when in POST_RECOVERY so I'm +1 on moving it there.
There was a problem hiding this comment.
I would prefer not to call updateGlobalCheckpointOnReplica on the GlobalCheckpointTracker if the shard is a blessed primary. A shard that's created from snapshot / local_store / local_shards is by definition blessed from the master. It should just activate the tracker. The activation logic for a replica can be different than for a primary.
| SequenceNumbers.CommitInfo commitInfo = store.loadSeqNoInfo(); | ||
| if (commitInfo.localCheckpoint < globalCheckpoint) { | ||
| throw new IllegalArgumentException( | ||
| "trying to create a shard whose local checkpoint [" + commitInfo.localCheckpoint + "] is > global checkpoint [" |
There was a problem hiding this comment.
you mean smaller, not larger?
There was a problem hiding this comment.
hehe. I got this wrong at the first run, fixed the if clause but forgot the text. Thanks.
| recoveryState.getRecoverySource().getType() != RecoverySource.Type.EXISTING_STORE; | ||
| SequenceNumbers.CommitInfo commitInfo = store.loadSeqNoInfo(); | ||
| if (commitInfo.localCheckpoint < globalCheckpoint) { | ||
| throw new IllegalArgumentException( |
There was a problem hiding this comment.
I want to better understand this. When would we expect this to happen?
There was a problem hiding this comment.
I don't but I was really worried this may happen in my original approach to add the global checkpoint to the recovery messages. I decided to keep this check when I rolled that part of the change back.
There was a problem hiding this comment.
maybe have an assertion instead then?
| active.set(true); | ||
| // we have to set it before we recover from the translog as acquring a snapshot from the translog causes a sync which | ||
| // causes the global checkpoint to be pulled in. | ||
| globalCheckpointTracker.updateGlobalCheckpointOnReplica(getEngine().getTranslog().getLastSyncedGlobalCheckpoint(), |
There was a problem hiding this comment.
same comment as above, here we can call globalCheckpointTracker.activatePrimaryMode instead.
This will allow to only leave activatePrimaryMode on primary promotion in updateShardState (with exception of 6.x backport which still needs the 5.x backcompat logic for primary relocation activation).
| "trying to create a shard whose local checkpoint [" + commitInfo.localCheckpoint + "] is > global checkpoint [" | ||
| + globalCheckpoint + "]"); | ||
| } | ||
| globalCheckpointTracker.updateGlobalCheckpointOnReplica(globalCheckpoint, "opening index with a new translog"); |
There was a problem hiding this comment.
Note that in case of peer recovery with a retry, we could end up with a higher gcp in the globalcheckpointtracker than what we're setting here.
There was a problem hiding this comment.
agreed. But that's OK, I think?
| "trying to create a shard whose local checkpoint [" + commitInfo.localCheckpoint + "] is > global checkpoint [" | ||
| + globalCheckpoint + "]"); | ||
| } | ||
| globalCheckpointTracker.updateGlobalCheckpointOnReplica(globalCheckpoint, "opening index with a new translog"); |
There was a problem hiding this comment.
In case of LOCAL_SHARDS/RESTORE, we could again call activatePrimary here.
There was a problem hiding this comment.
agreed. See other comment.
|
Thx @ywelsch |
Non-Peer recoveries should restore the global checkpoint rather than wait for the activation of the primary. This brings us a step closer to a universe where a recovered shard always has a valid global checkpoint. Concretely: 1) Recovery from store can read the checkpoint from the translog 2) Recovery from local shards and snapshots can set the global checkpoint to the local checkpoint as this is the only copy of the shard. 3) Recovery of an empty shard can set it to `NO_OPS_PERFORMED` Peer recoveries will follow but require more work and thus will have their own PR. I also used the moment to clean up `IndexShard`'s api around starting the engine and doing recovery from the translog. The current naming are a relic of the past and don't align with the current naming schemes in the engine.
In PR elastic#27965, we set the global checkpoint from the translog in a store recovery. However, we set after an engine is opened. This causes the global checkpoint assertion in TranslogWriter violated as if we are forced to close the engine before we set the global checkpoint. A closing engine will close translog which in turn read the current global checkpoint; however it is still unassigned and smaller than the initial global checkpoint from translog. Closes elastic#27970
In PR #27965, we set the global checkpoint from the translog in a store recovery. However, we set after an engine is opened. This causes the global checkpoint assertion in TranslogWriter violated as if we are forced to close the engine before we set the global checkpoint. A closing engine will close translog which in turn read the current global checkpoint; however it is still unassigned and smaller than the initial global checkpoint from translog. Closes #27970
In PR #27965, we set the global checkpoint from the translog in a store recovery. However, we set after an engine is opened. This causes the global checkpoint assertion in TranslogWriter violated as if we are forced to close the engine before we set the global checkpoint. A closing engine will close translog which in turn read the current global checkpoint; however it is still unassigned and smaller than the initial global checkpoint from translog. Closes #27970
Non-Peer recoveries should restore the global checkpoint rather than wait for the activation of the primary. This brings us a step closer to a universe where a recovered shard always has a valid global checkpoint. Concretely:
NO_OPS_PERFORMEDPeer recoveries will follow but require more work and thus will have their own PR.
I also used the moment to clean up
IndexShard's api around starting the engine and doing recovery from the translog. The current naming are a relic of the past and don't align with the current naming schemes in the engine.