Non-peer recovery should set the global checkpoint#27965

Merged

bleskes merged 6 commits intoelastic:masterfrom

bleskes:recovery_set_global_checkpoint_when_creating_translog

Dec 22, 2017

Contributor

bleskes commented Dec 22, 2017

Non-Peer recoveries should restore the global checkpoint rather than wait for the activation of the primary. This brings us a step closer to a universe where a recovered shard always has a valid global checkpoint. Concretely:

Recovery from store can read the checkpoint from the translog
Recovery from local shards and snapshots can set the global checkpoint to the local checkpoint as this is the only copy of the shard.
Recovery of an empty shard can set it to NO_OPS_PERFORMED

Peer recoveries will follow but require more work and thus will have their own PR.

I also used the moment to clean up IndexShard's api around starting the engine and doing recovery from the translog. The current naming are a relic of the past and don't align with the current naming schemes in the engine.

bleskes added 3 commits

December 22, 2017 14:51


          properly set global checkpoint on non-peer recoveries

eb5aa19


          set the global checkpoint earlier when recovering from the translog

747d13d


          some tests and fixes

937651a

bleskes added :Distributed/Recovery :Sequence IDs >enhancement v6.2.0 v7.0.0 labels

bleskes requested review from dnhatn and ywelsch

December 22, 2017 15:13


          java docs

cb818fc

ywelsch reviewed

View reviewed changes

Contributor

ywelsch left a comment

I've left some questions/suggestions

core/src/main/java/org/elasticsearch/index/store/Store.java Outdated

    
                   * Loads the maximum sequence number and local checkpoint from the latest Lucene commit point.

                   *

                   * @return a tuple populated with the maximum sequence number and the local checkpoint

                   * @return {@link org.elasticsearch.index.seqno.SequenceNumbers.CommitInfo} containing information about the last commit

Contributor

ywelsch Dec 22, 2017

no need for full path here, just {@link SequenceNumbers.CommitInfo}

Contributor Author

bleskes Dec 22, 2017

IntelliJ :(

core/src/main/java/org/elasticsearch/index/seqno/SequenceNumbers.java

                    * @return the sequence number stats
                    */
-                  public static Tuple<Long, Long> loadSeqNoInfoFromLuceneCommit(
+                  public static CommitInfo loadSeqNoInfoFromLuceneCommit(

Contributor

ywelsch Dec 22, 2017

👍

Contributor Author

bleskes Dec 22, 2017

yeah.. i almost got it wrong :)

Member

dnhatn Dec 22, 2017

++

core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+                      final RecoveryState.Translog translogStats = recoveryState().getTranslog();
+                      translogStats.totalOperations(0);
+                      translogStats.totalOperationsOnStart(0);
+                      globalCheckpointTracker.updateGlobalCheckpointOnReplica(SequenceNumbers.NO_OPS_PERFORMED, "index created");

Contributor

ywelsch Dec 22, 2017

I think a nicer approach (can be a follow-up done by me) would be not to call updateGlobalCheckpointOnReplica here, but instead call

globalCheckpointTracker.activatePrimaryMode(SequenceNumbers.NO_OPS_PERFORMED);

either here or in the IndexShard constructor (where we create the GlobalCheckpointTracker) when the recovery source is EMPTY_STORE.

Contributor Author

bleskes Dec 22, 2017

yeah, I thought we talked about it. I would prefer to unify on when we activate the primary shard - on post recovery or on shard started. The activation logic is the same for local_shards and snapshot. For local_store we need the in sync set to be updated, which may be done as part of the constructor. I personally prefer a shard to be full ready when in POST_RECOVERY so I'm +1 on moving it there.

Contributor

ywelsch Dec 22, 2017

I would prefer not to call updateGlobalCheckpointOnReplica on the GlobalCheckpointTracker if the shard is a blessed primary. A shard that's created from snapshot / local_store / local_shards is by definition blessed from the master. It should just activate the tracker. The activation logic for a replica can be different than for a primary.

core/src/main/java/org/elasticsearch/index/shard/IndexShard.java Outdated

+                      SequenceNumbers.CommitInfo commitInfo = store.loadSeqNoInfo();
+                      if (commitInfo.localCheckpoint < globalCheckpoint) {
+                          throw new IllegalArgumentException(
+                              "trying to create a shard whose local checkpoint [" + commitInfo.localCheckpoint + "] is > global checkpoint ["

Contributor

ywelsch Dec 22, 2017

you mean smaller, not larger?

Contributor Author

bleskes Dec 22, 2017

hehe. I got this wrong at the first run, fixed the if clause but forgot the text. Thanks.

core/src/main/java/org/elasticsearch/index/shard/IndexShard.java Outdated

+                          recoveryState.getRecoverySource().getType() != RecoverySource.Type.EXISTING_STORE;
+                      SequenceNumbers.CommitInfo commitInfo = store.loadSeqNoInfo();
+                      if (commitInfo.localCheckpoint < globalCheckpoint) {
+                          throw new IllegalArgumentException(

Contributor

ywelsch Dec 22, 2017

I want to better understand this. When would we expect this to happen?

Contributor Author

bleskes Dec 22, 2017

I don't but I was really worried this may happen in my original approach to add the global checkpoint to the recovery messages. I decided to keep this check when I rolled that part of the change back.

Contributor

ywelsch Dec 22, 2017

maybe have an assertion instead then?

Contributor Author

bleskes Dec 22, 2017

sure

core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

                           active.set(true);
+                          // we have to set it before we recover from the translog as acquring a snapshot from the translog causes a sync which
+                          // causes the global checkpoint to be pulled in.
+                          globalCheckpointTracker.updateGlobalCheckpointOnReplica(getEngine().getTranslog().getLastSyncedGlobalCheckpoint(),

Contributor

ywelsch Dec 22, 2017

same comment as above, here we can call globalCheckpointTracker.activatePrimaryMode instead.
This will allow to only leave activatePrimaryMode on primary promotion in updateShardState (with exception of 6.x backport which still needs the 5.x backcompat logic for primary relocation activation).

core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+                              "trying to create a shard whose local checkpoint [" + commitInfo.localCheckpoint + "] is > global checkpoint ["
+                                  + globalCheckpoint + "]");
+                      }
+                      globalCheckpointTracker.updateGlobalCheckpointOnReplica(globalCheckpoint, "opening index with a new translog");

Contributor

ywelsch Dec 22, 2017

Note that in case of peer recovery with a retry, we could end up with a higher gcp in the globalcheckpointtracker than what we're setting here.

Contributor Author

bleskes Dec 22, 2017

agreed. But that's OK, I think?

core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+                              "trying to create a shard whose local checkpoint [" + commitInfo.localCheckpoint + "] is > global checkpoint ["
+                                  + globalCheckpoint + "]");
+                      }
+                      globalCheckpointTracker.updateGlobalCheckpointOnReplica(globalCheckpoint, "opening index with a new translog");

Contributor

ywelsch Dec 22, 2017

In case of LOCAL_SHARDS/RESTORE, we could again call activatePrimary here.

Contributor Author

bleskes Dec 22, 2017

agreed. See other comment.


          feedback

97f70dc

ywelsch approved these changes

View reviewed changes

Contributor

ywelsch left a comment

LGTM


          if -> assertion is hard

01820f6

bleskes merged commit adb49ef into elastic:master

bleskes deleted the recovery_set_global_checkpoint_when_creating_translog branch

December 22, 2017 20:39

Contributor Author

bleskes commented Dec 22, 2017

bleskes added a commit that referenced this pull request


          Non-peer recovery should set the global checkpoint (#27965)

ac43544

Non-Peer recoveries should restore the global checkpoint rather than wait for the activation of the primary. This brings us a step closer to a universe where a recovered shard always has a valid global checkpoint. Concretely:

1) Recovery from store can read the checkpoint from the translog
2) Recovery from local shards and snapshots can set the global checkpoint to the local checkpoint as this is the only copy of the shard.
3) Recovery of an empty shard can set it to `NO_OPS_PERFORMED`

Peer recoveries will follow but require more work and thus will have their own PR.

I also used the moment to clean up `IndexShard`'s api around starting the engine and doing recovery from the translog. The current naming are a relic of the past and don't align with the current naming schemes in the engine.

dnhatn mentioned this pull request

[CI] Global checkpoint assertion violated #27970

Closed

dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request


          Set global checkpoint before open engine from store

a778db5

In PR elastic#27965, we set the global checkpoint from the translog in a store
recovery. However, we set after an engine is opened. This causes the
global checkpoint assertion in TranslogWriter violated as if we are
forced to close the engine before we set the global checkpoint. A
closing engine will close translog which in turn read the current global
checkpoint; however it is still unassigned and smaller than the initial
global checkpoint from translog.

Closes elastic#27970

dnhatn mentioned this pull request

Set global checkpoint before open engine from store #27972

Merged

ywelsch pushed a commit that referenced this pull request


          Set global checkpoint before open engine from store (#27972)

436a243

In PR #27965, we set the global checkpoint from the translog in a store
recovery. However, we set after an engine is opened. This causes the
global checkpoint assertion in TranslogWriter violated as if we are
forced to close the engine before we set the global checkpoint. A
closing engine will close translog which in turn read the current global
checkpoint; however it is still unassigned and smaller than the initial
global checkpoint from translog.

Closes #27970

ywelsch pushed a commit that referenced this pull request


          Set global checkpoint before open engine from store (#27972)

146e82d

In PR #27965, we set the global checkpoint from the translog in a store
recovery. However, we set after an engine is opened. This causes the
global checkpoint assertion in TranslogWriter violated as if we are
forced to close the engine before we set the global checkpoint. A
closing engine will close translog which in turn read the current global
checkpoint; however it is still unassigned and smaller than the initial
global checkpoint from translog.

Closes #27970

clintongormley added :Distributed/Engine and removed :Sequence IDs labels

jpountz removed the :Distributed/Engine label

colings86 added v7.0.0-beta1 and removed v7.0.0 labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Recovery >enhancement v6.2.0 v7.0.0-beta1