Skip to content

Backport for using lastSyncedGlobalCheckpoint in deletion policy#27866

Merged
dnhatn merged 7 commits intoelastic:6.xfrom
dnhatn:backport-gcp-translog
Dec 19, 2017
Merged

Backport for using lastSyncedGlobalCheckpoint in deletion policy#27866
dnhatn merged 7 commits intoelastic:6.xfrom
dnhatn:backport-gcp-translog

Conversation

@dnhatn
Copy link
Copy Markdown
Member

@dnhatn dnhatn commented Dec 18, 2017

This is a backport of #27826.

@bleskes, Could you please take a quick look to make sure that this backport fit with your fix in #26734.

Today we use the in-memory global checkpoint from SequenceNumbersService
to clean up unneeded commit points, however the latest global checkpoint
may haven't fsynced to the disk yet. If the translog checkpoint fsync
failed and we already use a higher global checkpoint to clean up commit
points, then we may have removed a safe commit which we try to keep for
recovery.

This commit updates the deletion policy using lastSyncedGlobalCheckpoint
from Translog rather the in memory global checkpoint.

Relates elastic#27606
@dnhatn dnhatn added :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v6.2.0 labels Dec 18, 2017
@dnhatn dnhatn requested review from bleskes and ywelsch December 18, 2017 15:12
Copy link
Copy Markdown
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want @bleskes to have a look too.

private void refreshTranslogUUIDIfNeeded(IndexWriter writer, Translog translog) throws IOException {
final String translogUUID = commitDataAsMap(writer).get(Translog.TRANSLOG_UUID_KEY);
if (translog.getTranslogUUID().equals(translogUUID) == false) {
assert openMode != EngineConfig.OpenMode.OPEN_INDEX_AND_TRANSLOG : "OPEN_INDEX_AND_TRANSLOG must not fresh translogUUID";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertion message reads funny

assert translog.getGeneration() != null;
// we can only do this after we generated and committed a translog uuid. other wise the combined
// retention policy, which listens to commits, gets all confused.
refreshTranslogUUIDIfNeeded(writer, translog);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this line above the comment? The comment applies to persistHistoryUUIDIfNeeded

assert translog.getGeneration() != null;
// we can only do this after we generated and committed a translog uuid. other wise the combined
// retention policy, which listens to commits, gets all confused.
refreshTranslogUUIDIfNeeded(writer, translog);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer it like this, wdyt?

diff --git a/core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java b/core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java
index b0523a1c3d2..59ddbb82ccb 100644
--- a/core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java
+++ b/core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java
@@ -198,10 +198,11 @@ public class InternalEngine extends Engine {
                 historyUUID = loadOrGenerateHistoryUUID(writer, engineConfig.getForceNewHistoryUUID());
                 Objects.requireNonNull(historyUUID, "history uuid should not be null");
                 indexWriter = writer;
-                // we can only do this after we generated and committed a translog uuid. other wise the combined
-                // retention policy, which listens to commits, gets all confused.
-                refreshTranslogUUIDIfNeeded(writer, translog);
-                persistHistoryUUIDIfNeeded();
+                if (openMode == EngineConfig.OpenMode.OPEN_INDEX_AND_TRANSLOG) {
+                    persistHistoryUUIDIfNeeded();
+                } else {
+                    commitNewTranslogAndHistoryUUIDs(writer, translog);
+                }
             } catch (IOException | TranslogCorruptedException e) {
                 throw new EngineCreationFailureException(shardId, "failed to create engine", e);
             } catch (AssertionError e) {
@@ -461,13 +462,15 @@ public class InternalEngine extends Engine {
         return new Translog(translogConfig, translogUUID, translogDeletionPolicy, globalCheckpointSupplier);
     }

-    private void refreshTranslogUUIDIfNeeded(IndexWriter writer, Translog translog) throws IOException {
-        final String translogUUID = commitDataAsMap(writer).get(Translog.TRANSLOG_UUID_KEY);
-        if (translog.getTranslogUUID().equals(translogUUID) == false) {
-            assert openMode != EngineConfig.OpenMode.OPEN_INDEX_AND_TRANSLOG : "OPEN_INDEX_AND_TRANSLOG must not fresh translogUUID";
-            commitIndexWriter(writer, translog, openMode == EngineConfig.OpenMode.OPEN_INDEX_CREATE_TRANSLOG
-                ? commitDataAsMap(writer).get(SYNC_COMMIT_ID) : null);
-        }
+    private void commitNewTranslogAndHistoryUUIDs(IndexWriter writer, Translog translog) throws IOException {
+        final Map<String, String> commitUserData = commitDataAsMap(writer);
+        assert commitUserData.get(Translog.TRANSLOG_UUID_KEY).equals(translog.getTranslogUUID()) == false :
+            "committing a new translog uuid but it's already equal (uuid [" + translog.getTranslogUUID() + "](";
+        assert openMode != EngineConfig.OpenMode.OPEN_INDEX_AND_TRANSLOG : "OPEN_INDEX_AND_TRANSLOG must not fresh translogUUID";
+        assert historyUUID.equals(commitUserData.containsKey(HISTORY_UUID_KEY)) == false || config().getForceNewHistoryUUID() == false :
+            "a new history id was forced, but the one in the commit is equal";
+        commitIndexWriter(writer, translog, openMode == EngineConfig.OpenMode.OPEN_INDEX_CREATE_TRANSLOG
+            ? commitUserData.get(SYNC_COMMIT_ID) : null);
     }

@dnhatn
Copy link
Copy Markdown
Member Author

dnhatn commented Dec 18, 2017

please retest this.

@dnhatn
Copy link
Copy Markdown
Member Author

dnhatn commented Dec 18, 2017

@ywelsch, This backport is tricky than we thought. Sorry for blocking your PR.

Copy link
Copy Markdown
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bleskes
Copy link
Copy Markdown
Contributor

bleskes commented Dec 19, 2017

once the ci is green, this can go in.

@dnhatn
Copy link
Copy Markdown
Member Author

dnhatn commented Dec 19, 2017

Thanks @bleskes for reviewing.

@dnhatn dnhatn merged commit 926d4b3 into elastic:6.x Dec 19, 2017
@dnhatn dnhatn deleted the backport-gcp-translog branch December 19, 2017 21:50
@dnhatn dnhatn removed the >feature label Aug 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v6.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants