Skip to content

Non-peer recovery should set the global checkpoint#27965

Merged
bleskes merged 6 commits intoelastic:masterfrom
bleskes:recovery_set_global_checkpoint_when_creating_translog
Dec 22, 2017
Merged

Non-peer recovery should set the global checkpoint#27965
bleskes merged 6 commits intoelastic:masterfrom
bleskes:recovery_set_global_checkpoint_when_creating_translog

Conversation

@bleskes
Copy link
Copy Markdown
Contributor

@bleskes bleskes commented Dec 22, 2017

Non-Peer recoveries should restore the global checkpoint rather than wait for the activation of the primary. This brings us a step closer to a universe where a recovered shard always has a valid global checkpoint. Concretely:

  1. Recovery from store can read the checkpoint from the translog
  2. Recovery from local shards and snapshots can set the global checkpoint to the local checkpoint as this is the only copy of the shard.
  3. Recovery of an empty shard can set it to NO_OPS_PERFORMED

Peer recoveries will follow but require more work and thus will have their own PR.

I also used the moment to clean up IndexShard's api around starting the engine and doing recovery from the translog. The current naming are a relic of the past and don't align with the current naming schemes in the engine.

@bleskes bleskes added :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. :Sequence IDs >enhancement v6.2.0 v7.0.0 labels Dec 22, 2017
@bleskes bleskes requested review from dnhatn and ywelsch December 22, 2017 15:13
Copy link
Copy Markdown
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some questions/suggestions

* Loads the maximum sequence number and local checkpoint from the latest Lucene commit point.
*
* @return a tuple populated with the maximum sequence number and the local checkpoint
* @return {@link org.elasticsearch.index.seqno.SequenceNumbers.CommitInfo} containing information about the last commit
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for full path here, just {@link SequenceNumbers.CommitInfo}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntelliJ :(

* @return the sequence number stats
*/
public static Tuple<Long, Long> loadSeqNoInfoFromLuceneCommit(
public static CommitInfo loadSeqNoInfoFromLuceneCommit(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah.. i almost got it wrong :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

final RecoveryState.Translog translogStats = recoveryState().getTranslog();
translogStats.totalOperations(0);
translogStats.totalOperationsOnStart(0);
globalCheckpointTracker.updateGlobalCheckpointOnReplica(SequenceNumbers.NO_OPS_PERFORMED, "index created");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a nicer approach (can be a follow-up done by me) would be not to call updateGlobalCheckpointOnReplica here, but instead call

globalCheckpointTracker.activatePrimaryMode(SequenceNumbers.NO_OPS_PERFORMED);

either here or in the IndexShard constructor (where we create the GlobalCheckpointTracker) when the recovery source is EMPTY_STORE.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I thought we talked about it. I would prefer to unify on when we activate the primary shard - on post recovery or on shard started. The activation logic is the same for local_shards and snapshot. For local_store we need the in sync set to be updated, which may be done as part of the constructor. I personally prefer a shard to be full ready when in POST_RECOVERY so I'm +1 on moving it there.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to call updateGlobalCheckpointOnReplica on the GlobalCheckpointTracker if the shard is a blessed primary. A shard that's created from snapshot / local_store / local_shards is by definition blessed from the master. It should just activate the tracker. The activation logic for a replica can be different than for a primary.

SequenceNumbers.CommitInfo commitInfo = store.loadSeqNoInfo();
if (commitInfo.localCheckpoint < globalCheckpoint) {
throw new IllegalArgumentException(
"trying to create a shard whose local checkpoint [" + commitInfo.localCheckpoint + "] is > global checkpoint ["
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean smaller, not larger?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe. I got this wrong at the first run, fixed the if clause but forgot the text. Thanks.

recoveryState.getRecoverySource().getType() != RecoverySource.Type.EXISTING_STORE;
SequenceNumbers.CommitInfo commitInfo = store.loadSeqNoInfo();
if (commitInfo.localCheckpoint < globalCheckpoint) {
throw new IllegalArgumentException(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to better understand this. When would we expect this to happen?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't but I was really worried this may happen in my original approach to add the global checkpoint to the recovery messages. I decided to keep this check when I rolled that part of the change back.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe have an assertion instead then?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

active.set(true);
// we have to set it before we recover from the translog as acquring a snapshot from the translog causes a sync which
// causes the global checkpoint to be pulled in.
globalCheckpointTracker.updateGlobalCheckpointOnReplica(getEngine().getTranslog().getLastSyncedGlobalCheckpoint(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above, here we can call globalCheckpointTracker.activatePrimaryMode instead.
This will allow to only leave activatePrimaryMode on primary promotion in updateShardState (with exception of 6.x backport which still needs the 5.x backcompat logic for primary relocation activation).

"trying to create a shard whose local checkpoint [" + commitInfo.localCheckpoint + "] is > global checkpoint ["
+ globalCheckpoint + "]");
}
globalCheckpointTracker.updateGlobalCheckpointOnReplica(globalCheckpoint, "opening index with a new translog");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in case of peer recovery with a retry, we could end up with a higher gcp in the globalcheckpointtracker than what we're setting here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. But that's OK, I think?

"trying to create a shard whose local checkpoint [" + commitInfo.localCheckpoint + "] is > global checkpoint ["
+ globalCheckpoint + "]");
}
globalCheckpointTracker.updateGlobalCheckpointOnReplica(globalCheckpoint, "opening index with a new translog");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of LOCAL_SHARDS/RESTORE, we could again call activatePrimary here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. See other comment.

Copy link
Copy Markdown
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bleskes bleskes merged commit adb49ef into elastic:master Dec 22, 2017
@bleskes bleskes deleted the recovery_set_global_checkpoint_when_creating_translog branch December 22, 2017 20:39
@bleskes
Copy link
Copy Markdown
Contributor Author

bleskes commented Dec 22, 2017

Thx @ywelsch

bleskes added a commit that referenced this pull request Dec 22, 2017
Non-Peer recoveries should restore the global checkpoint rather than wait for the activation of the primary. This brings us a step closer to a universe where a recovered shard always has a valid global checkpoint. Concretely:

1) Recovery from store can read the checkpoint from the translog
2) Recovery from local shards and snapshots can set the global checkpoint to the local checkpoint as this is the only copy of the shard.
3) Recovery of an empty shard can set it to `NO_OPS_PERFORMED`

Peer recoveries will follow but require more work and thus will have their own PR.

I also used the moment to clean up `IndexShard`'s api around starting the engine and doing recovery from the translog. The current naming are a relic of the past and don't align with the current naming schemes in the engine.
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Dec 23, 2017
In PR elastic#27965, we set the global checkpoint from the translog in a store
recovery. However, we set after an engine is opened. This causes the
global checkpoint assertion in TranslogWriter violated as if we are
forced to close the engine before we set the global checkpoint. A
closing engine will close translog which in turn read the current global
checkpoint; however it is still unassigned and smaller than the initial
global checkpoint from translog.

Closes elastic#27970
ywelsch pushed a commit that referenced this pull request Dec 23, 2017
In PR #27965, we set the global checkpoint from the translog in a store
recovery. However, we set after an engine is opened. This causes the
global checkpoint assertion in TranslogWriter violated as if we are
forced to close the engine before we set the global checkpoint. A
closing engine will close translog which in turn read the current global
checkpoint; however it is still unassigned and smaller than the initial
global checkpoint from translog.

Closes #27970
ywelsch pushed a commit that referenced this pull request Dec 23, 2017
In PR #27965, we set the global checkpoint from the translog in a store
recovery. However, we set after an engine is opened. This causes the
global checkpoint assertion in TranslogWriter violated as if we are
forced to close the engine before we set the global checkpoint. A
closing engine will close translog which in turn read the current global
checkpoint; however it is still unassigned and smaller than the initial
global checkpoint from translog.

Closes #27970
@clintongormley clintongormley added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Sequence IDs labels Feb 14, 2018
@jpountz jpountz removed the :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. label Jan 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement v6.2.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants