Translog file recovery should not rely on lucene commits by bleskes · Pull Request #25005 · elastic/elasticsearch

bleskes · 2017-06-01T15:06:53Z

When we open a translog, we rely on the translog.ckp file to tell us what the maximum generation file should be and on the information stored in the last lucene commit to know the first file we need to recover. This requires coordination and is currently subject to a race condition: if a node dies after a lucene commit is made but before we remove the translog generations that were unneeded by it, the next time we open the translog we will ignore those files and never delete them (I have added tests for this).

This PR changes the approach to have the translog store both of those numbers in the translog.ckp. This means it's more self contained and easier to control.

This change also decouples the translog recovery logic from the specific commit we're opening. This prepares the ground to fully utilize the deletion policy introduced in #24950 and store more translog data that's needed for Lucene, keep multiple lucene commits around and be free to recover from any of them.

bleskes · 2017-06-01T16:08:15Z

retest this please

When we open a translog, we rely on the `translog.ckp` file to tell us what the maximum generation file should be and on the information stored in the last lucene commit to know the first file we need to recover. This requires coordination and is currently subject to a race condition: if a node dies after a lucene commit is made but before we remove the translog generations that were unneeded by it, the next we open the translog we will ignore those files and never delete them (I have added tests for this). This PR changes the approach to have the translog store both of those numbers in the `translog.ckp`. This means it's more self contained and easier to control. This change also decouples the translog recovery logic from the specific commit we're opening. This prepares the ground to fully utilize the deletion policy introduce elastic#24950 and store more translog data that's needed for Lucene, keep multiple lucene commits around, and be free to recover from any of them.

…in_checkpoint

jasontedor

Let's figure out something better than that disgusting hack. Otherwise it looks good.

jasontedor · 2017-06-06T13:34:00Z

core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

    public void testSyncedFlush() throws IOException {
        try (Store store = createStore();
-            Engine engine = new InternalEngine(config(defaultSettings, store, createTempDir(), new LogByteSizeMergePolicy(), null))) {
+             Engine engine = new InternalEngine(config(defaultSettings, store, createTempDir(), new LogByteSizeMergePolicy(), null))) {


Extra space is extra.

well, it's part of the try() clause and not the body, I think this is good ? (it's also the auto formatter)

I see, that is hideous. 😦

jasontedor · 2017-06-06T13:34:18Z

core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

        try (Store store = createStore();
-            Engine engine = new InternalEngine(config(defaultSettings, store, createTempDir(),
-                     new LogByteSizeMergePolicy(), null))) { // use log MP here we test some behavior in ESMP
+             Engine engine = new InternalEngine(config(defaultSettings, store, createTempDir(),


Extra space is extra.

well, it's part of the try() clause and not the body, I think this is good ? (it's also the auto formatter)

I see, that is hideous. 😦

jasontedor · 2017-06-06T13:35:21Z

core/src/test/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetServiceTests.java

+            FileChannel::open,
+            TranslogConfig.DEFAULT_BUFFER_SIZE,
+            () -> globalCheckpoint, () -> generation)) {
+        }


It looks like it was unnecessary to touch this file?

I need to add the new parameters?

jasontedor · 2017-06-06T13:55:42Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

        }
    }

+    // commit hook for testing


It's a matter of taste, but I no longer find these comments to be useful.

I don't mind. just following conventions.

And I'm doing what I can to influence abandonment of it.

well, you're the reviewer for this one and I don't mind - so gone it is.

jasontedor · 2017-06-06T13:55:57Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

    }

+    // commit hook for testing
+    void callCommitOnWriter(IndexWriter writer) throws IOException {


I think it should just be called commit or commitWriter.

I struggled with this name. We already have a commitIndexWriter, so I figured to be explicit and call this exactly what this does.

I think we can achieve the same without introducing this method (that has potential to be used in the wrong place) with the following:

diff --git a/core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java b/core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java index 18fafb6e90..8bbf320e26 100644 --- a/core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java +++ b/core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java @@ -1778,7 +1778,7 @@ public class InternalEngine extends Engine { * @param syncId the sync flush ID ({@code null} if not committing a synced flush) * @throws IOException if an I/O exception occurs committing the specfied writer */ - private void commitIndexWriter(final IndexWriter writer, final Translog translog, @Nullable final String syncId) throws IOException { + void commitIndexWriter(final IndexWriter writer, final Translog translog, @Nullable final String syncId) throws IOException { ensureCanFlush(); try { final long localCheckpoint = seqNoService().getLocalCheckpoint(); @@ -1810,7 +1810,7 @@ public class InternalEngine extends Engine { return commitData.entrySet().iterator(); }); - callCommitOnWriter(writer); + writer.commit(); } catch (final Exception ex) { try { failEngine("lucene commit failed", ex); @@ -1837,11 +1837,6 @@ public class InternalEngine extends Engine { } } - // commit hook for testing - void callCommitOnWriter(IndexWriter writer) throws IOException { - writer.commit(); - } - private void ensureCanFlush() { // translog recover happens after the engine is fully constructed // if we are in this stage we have to prevent flushes from this diff --git a/core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java b/core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java index a767e50d4e..a10381be4a 100644 --- a/core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java +++ b/core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java @@ -1163,8 +1163,9 @@ public class InternalEngineTests extends ESTestCase { } public void testSyncedFlush() throws IOException { - try (Store store = createStore(); - Engine engine = new InternalEngine(config(defaultSettings, store, createTempDir(), new LogByteSizeMergePolicy(), null))) { + try ( + Store store = createStore(); + Engine engine = new InternalEngine(config(defaultSettings, store, createTempDir(), new LogByteSizeMergePolicy(), null))) { final String syncId = randomUnicodeOfCodepointLengthBetween(10, 20); ParsedDocument doc = testParsedDocument("1", null, testDocumentWithTextField(), B_1, null); engine.index(indexForDoc(doc)); @@ -2496,8 +2497,8 @@ public class InternalEngineTests extends ESTestCase { final Path translogPath = createTempDir(); try (InternalEngine engine = new InternalEngine(config(defaultSettings, store, translogPath, newMergePolicy(), null, null)) { @Override - void callCommitOnWriter(IndexWriter writer) throws IOException { - super.callCommitOnWriter(writer); + void commitIndexWriter(IndexWriter writer, Translog translog, String syncId) throws IOException { + super.commitIndexWriter(writer, translog, syncId); if (throwErrorOnCommit.get()) { throw new RuntimeException("power's out"); }

What do you think?

jasontedor · 2017-06-07T00:36:56Z

core/src/main/java/org/elasticsearch/index/translog/Translog.java

+            final long minGenerationToRecoverFrom;
+            if (checkpoint.minTranslogGeneration < 0) {
+                final Version indexVersionCreated = indexSettings().getIndexVersionCreated();
+                assert indexVersionCreated.before(Version.V_6_0_0_alpha2) :


I think this needs to be Version.V_6_0_0_alpha3 now.

yep changed.

jasontedor · 2017-06-07T00:37:48Z

core/src/main/java/org/elasticsearch/index/translog/Translog.java

            Collections.reverse(foundTranslogs);
+
+            // when we clean up files, we first update the checkpoint with a new minReferencedTranslog and then delete them
+            // if we crush just at the wrong moment, it may be that we leave one unreferenced file behind. Delete it if there


crush -> crash

. Delete it if there -> so we delete it if they exist

jasontedor · 2017-06-07T00:39:40Z

core/src/main/java/org/elasticsearch/index/translog/Translog.java

            }
            Collections.reverse(foundTranslogs);
+
+            // when we clean up files, we first update the checkpoint with a new minReferencedTranslog and then delete them


them -> them;

jasontedor · 2017-06-07T00:41:48Z

core/src/main/java/org/elasticsearch/index/translog/Translog.java

+     * Returns the minimum file generation referenced by the translog
+     */
+    long getMinFileGeneration() {
+        try (ReleasableLock ignored = readLock.acquire()) {


Let's change the conditional so we can avoid the negative:

if (readers.isEmpty()) { return current.getGeneration() } else { return readers.get(0).getGeneration(); }

jasontedor · 2017-06-07T00:42:21Z

core/src/main/java/org/elasticsearch/index/translog/Translog.java

-                globalCheckpointSupplier);
+                globalCheckpointSupplier,
+                minTranslogGenerationSupplier
+                );


Can we place this on the end of the previous line?

…in_checkpoint

bleskes · 2017-06-07T07:42:09Z

Thx @jasontedor . I addressed all your comments.

s1monw

left some nits LGTM otherwise

s1monw · 2017-06-07T12:31:16Z

core/src/main/java/org/elasticsearch/index/translog/Translog.java

-                IOUtils.closeWhileHandlingException(unreferencedReader);
-                IOUtils.deleteFilesIgnoringExceptions(translogPath,
-                        translogPath.resolveSibling(getCommitCheckpointFileName(unreferencedReader.getGeneration())));
+                // update the checkpoint not to reference the removed file


nit: can this comment be more clear ie. tell us what we update to make sure we don't ref this file.

s1monw · 2017-06-07T12:33:57Z

core/src/main/java/org/elasticsearch/index/translog/Translog.java

+                "deletion policy requires a minReferenceGen of [" + minReferencedGen + "] which is higher than the current generation ["
+                    + currentFileGeneration() + "]";
+
+            while (readers.isEmpty() == false && readers.get(0).getGeneration() < minReferencedGen) {


can we use an iterator here instead? it would be more clear to me if we'd do that..

s1monw · 2017-06-07T12:35:03Z

core/src/main/java/org/elasticsearch/index/translog/Translog.java

-                IOUtils.deleteFilesIgnoringExceptions(translogPath,
-                        translogPath.resolveSibling(getCommitCheckpointFileName(unreferencedReader.getGeneration())));
+                // update the checkpoint not to reference the removed file
+                current.sync();


should we try to delete in a finally block here? best effort?

If we fail to sync, I think we want to keep the file around because it's being referenced by the ckp?

jasontedor · 2017-06-07T12:41:52Z

core/src/test/java/org/elasticsearch/index/translog/TranslogTests.java

        assertFalse("translog [" + id + "] still exists", Files.exists(translog.location().resolve(Translog.getFilename(id))));
    }

+    private void assertFilesPresence(Translog translog) {


I think to be ultra-pedantic it should be assertFilePresences.

jasontedor · 2017-06-07T12:46:10Z

core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

        try (Store store = createStore();
-            Engine engine = new InternalEngine(config(defaultSettings, store, createTempDir(),
-                     new LogByteSizeMergePolicy(), null))) { // use log MP here we test some behavior in ESMP
+             Engine engine = new InternalEngine(config(defaultSettings, store, createTempDir(),


I see, that is hideous. 😦

…in_checkpoint # Conflicts: # core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

jasontedor

I left a few more comments.

jasontedor

LGTM.

s1monw · 2017-06-07T14:15:06Z

core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

+            final Path translogPath = createTempDir();
+            try (InternalEngine engine = new InternalEngine(config(defaultSettings, store, translogPath, newMergePolicy(), null, null)) {
+                @Override
+                protected void commitIndexWriter(IndexWriter writer, Translog translog, String syncId) throws IOException {


instead of making this method protected I think we should use MockDirectoryWrapper#failOn(Failure) and pass some failure to it that fails if we commit the indexwriter like this:

Failure fail = new Failure() { @Override public void eval(MockDirectoryWrapper dir) throws IOException { for (StackTraceElement e : Thread.currentThread().getStackTrace()) { if (doFail && "commit".equals(e.getMethodName())) { throw new FakeIOException(); } } } };

I'm okay with this.

@s1monw I tried this is in many variants and non of them was good. The Failure as stated fails too early (before the segmetns_N file is written). I tried many other variants but none of them allow all of the commit logic to complete without triggering any failure handling in the IndexWriter which also means later on that the new commits files are cleaned by a rollback we do when we fail the engine. Even if we do find a way to do this, I think it will be way too brittle and tend to break with changes in Lucene. Bottom line, I prefer to keep the current solution.

fair enough

…in_checkpoint # Conflicts: # core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java # core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

bleskes · 2017-06-08T07:22:05Z

Thanks @jasontedor @s1monw for the thorough review.

…trim #25005 changed the translog dynamic to fsync the checkpoint before trimming a file. This changed the dynamics of potential failure modes which requires a change to testWithRandomException - it's now possible that we had an exception but the translog was trimmed. Closes #25133

bleskes added :Translog >enhancement v6.0.0 labels Jun 1, 2017

bleskes requested review from jasontedor and s1monw June 1, 2017 15:06

bleskes changed the title ~~Translog file recovery should not rely on lucene checkpoint~~ Translog file recovery should not rely on lucene commits Jun 1, 2017

bleskes force-pushed the translog_min_gen_in_checkpoint branch from 4f7f33a to 2e4a617 Compare June 1, 2017 20:41

bleskes added 2 commits June 1, 2017 23:04

assertFileSPresence

92fde06

Merge remote-tracking branch 'upstream/master' into translog_min_gen_…

c9ebc1a

…in_checkpoint

jasontedor requested changes Jun 7, 2017

View reviewed changes

bleskes added 2 commits June 7, 2017 09:21

Merge remote-tracking branch 'upstream/master' into translog_min_gen_…

3fc8a2f

…in_checkpoint

feedback

dbdbbef

5_0_0 != 6_0_0

5d5979c

s1monw approved these changes Jun 7, 2017

View reviewed changes

jasontedor reviewed Jun 7, 2017

View reviewed changes

Merge remote-tracking branch 'upstream/master' into translog_min_gen_…

6f4b870

…in_checkpoint # Conflicts: # core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

jasontedor reviewed Jun 7, 2017

View reviewed changes

feedback

61934b4

bleskes requested a review from jasontedor June 7, 2017 13:12

another solution for failing just after committing lucene

440ecc9

bleskes force-pushed the translog_min_gen_in_checkpoint branch from 589c735 to 440ecc9 Compare June 7, 2017 13:43

sigh

0eb6c3b

jasontedor approved these changes Jun 7, 2017

View reviewed changes

s1monw reviewed Jun 7, 2017

View reviewed changes

bleskes added 2 commits June 8, 2017 07:48

use SOURCE

e018737

Merge remote-tracking branch 'upstream/master' into translog_min_gen_…

24ef1cc

…in_checkpoint # Conflicts: # core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java # core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

bleskes merged commit 087f182 into elastic:master Jun 8, 2017

bleskes deleted the translog_min_gen_in_checkpoint branch June 8, 2017 07:21

colings86 added v6.0.0-beta1 and removed v6.0.0 labels Aug 4, 2017

redlus mentioned this pull request Mar 16, 2018

Elasticsearch 6.2.2 nodes crash after reaching ulimit setting #29097

Closed

Conversation

bleskes commented Jun 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bleskes commented Jun 1, 2017

Uh oh!

jasontedor left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bleskes commented Jun 7, 2017

Uh oh!

s1monw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasontedor left a comment

Choose a reason for hiding this comment

Uh oh!

jasontedor left a comment

bleskes commented Jun 1, 2017 •

edited

Loading

bleskes Jun 8, 2017 •

edited

Loading