Introducing a translog deletion policy by bleskes · Pull Request #24950 · elastic/elasticsearch

bleskes · 2017-05-30T07:05:35Z

Currently, the decisions regarding which translog generation files to delete are hard coded in the interaction between the InternalEngine and the Translog classes. This PR extracts it to a dedicated class called TranslogDeletionPolicy, for two main reasons:

Simplicity - the code is easier to read and understand (no more two phase commit on the translog, the Engine can just commit and the translog will respond)
Preparing for future plans to extend the logic we need - i.e., retain multiple lucene commit and also introduce a size based retention logic, allowing people to always keep a certain amount of translog files around. The latter is useful to increase the chance of an ops based recovery.

…_policy

…_policy # Conflicts: # core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java # core/src/main/java/org/elasticsearch/index/shard/IndexShard.java # core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

s1monw

I left some minors really this looks awesome!

s1monw · 2017-05-30T08:38:27Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

        this.uidField = engineConfig.getIndexSettings().isSingleType() ? IdFieldMapper.NAME : UidFieldMapper.NAME;
        this.versionMap = new LiveVersionMap();
+        final TranslogDeletionPolicy translogDeletionPolicy = new TranslogDeletionPolicy();
+        this.deletionPolicy = new CombinedDeletionPolicy(


nit: does this need to break into 3 lines or is this maybe a leftover?

sadly it doesn't fit in one line, I'll make it like this - it seems you prefer it:

this.deletionPolicy = new CombinedDeletionPolicy( new SnapshotDeletionPolicy(new KeepOnlyLastCommitDeletionPolicy()), translogDeletionPolicy, openMode);

s1monw · 2017-05-30T08:39:39Z

core/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java

        return operationCounter;
    }

+    public long lastSyncedGlobalCheckpoint() {


or maybe pkg private?

actually, this can be completely removed - it's a leftover from the "somewhat into the future" POC. good catch.

s1monw · 2017-05-30T08:45:13Z

core/src/main/java/org/elasticsearch/index/translog/TranslogDeletionPolicy.java

+
+    /** Records how many views are held against each
+     *  translog generation */
+    protected final Map<Long,Integer> translogRefCounts = new HashMap<>();


maybe this is a good place for LongIntMap?

Another option would be Map<Long,Counter> then you can do this:

translogRefCounts.computeIfAbsent(translogGen, Counter.newCounter(false)).addAndGet(1); //.... value = translogRefCounts.computeIfAbsent(translogGen, Counter.newCounter(false)).addAndGet(-1);

It would make things easier to read IMO?

org.apache.lucene.util.Counter that is

I looked at LongIntMap but decided not to have a 3rd party dependency for this low performance, rarely used map. I like the Counter class usage. It simplifies things. Thanks!

s1monw · 2017-05-30T08:47:05Z

core/src/main/java/org/elasticsearch/index/translog/TranslogDeletionPolicy.java

+import java.util.List;
+import java.util.Map;
+
+public class TranslogDeletionPolicy {


maybe we can make this final right away?

sure. I thought I'd need to subclass in tests but it turned out not be necessary.

s1monw · 2017-05-30T08:52:41Z

core/src/main/java/org/elasticsearch/index/translog/Translog.java

            ensureOpen();
-            View view = new View(lastCommittedTranslogFileGeneration);
-            outstandingViews.add(view);
+            viewGenToClean = deletionPolicy.acquireTranslogGenForView();


wouldn't it be simpler if you remove the viewGenToClean and just do this return new View(deletionPolicy.acquireTranslogGenForView());

It would be but I'm paranoid about an exception in the View constructor. This way it's clearly safe.

there is no exception possibility here? I think this is overparanoia

The View constructor does not even do anything, it just sets a field?

I really do not like the use of setting viewGenToClean to -1 to indicate not to release the view. Is there a reason that you do not make viewGenToClean final and local to the try block and set a boolean flag to indicate success or not?

s1monw · 2017-05-30T08:54:26Z

core/src/main/java/org/elasticsearch/index/translog/TranslogDeletionPolicy.java

+     * returns the minimum translog generation that is still required by the system. Any generation below
+     * the returned value may be safely deleted
+     */
+    public synchronized long minTranslogGenRequired(List<TranslogReader> readers, TranslogWriter currentWriter) {


readers is unused?

readers is still unused?

yeah, I missed your comment from the last time. I will remove the params. They are a leftover from the POC (to show how we would do size based deletion).

s1monw · 2017-05-30T08:58:01Z

core/src/main/java/org/elasticsearch/index/engine/CombinedDeletionPolicy.java

+    }
+
+    private void setLastCommittedTranslogGeneration(List<? extends IndexCommit> commits) throws IOException {
+        final IndexCommit indexCommit = commits.get(commits.size() - 1);


can you leave a comment why we only use the last one? It would help others to reason about this code

s1monw · 2017-05-30T08:58:17Z

core/src/main/java/org/elasticsearch/index/engine/CombinedDeletionPolicy.java

+ * An {@link IndexDeletionPolicy} that coordinates between Lucene's commits and the retention of translog generation files,
+ * making sure that all translog files that are need to recover from the lucene commit are not deleted.
+ */
+public class CombinedDeletionPolicy extends IndexDeletionPolicy {


make this final and maybe pkg private?

yep. And it exposed a leftover wrong Java Doc reference.

…_policy

bleskes · 2017-05-30T09:46:37Z

Thx @s1monw . I addressed all your feedback. I will wait for @jasontedor to have a look as well.

…_policy # Conflicts: # core/src/test/java/org/elasticsearch/index/translog/TranslogTests.java

…_policy

bleskes · 2017-06-01T06:39:27Z

Thx @jasontedor . I addressed your comments. Can you take another look?

jasontedor

Some lingering nits that do not require another look from me but otherwise LGTM. Thanks @bleskes.

jasontedor · 2017-06-01T10:27:07Z

core/src/main/java/org/elasticsearch/index/translog/Translog.java

- * written, the current translogs file generation and it's fsynced offset in bytes.
+ * Each Translog has only one translog file open for writes at any time referenced by a translog generation ID. This ID is written to a
+ * <tt>translog.ckp</tt> file that is designed to fit in a single disk block such that a write of the file is atomic. The checkpoint file
+ * is written on each fsync operation of the translog and records the number of operations written, the current translogs file generation


translogs -> translog's

jasontedor · 2017-06-01T10:27:40Z

core/src/main/java/org/elasticsearch/index/translog/Translog.java

+ * Each Translog has only one translog file open for writes at any time referenced by a translog generation ID. This ID is written to a
+ * <tt>translog.ckp</tt> file that is designed to fit in a single disk block such that a write of the file is atomic. The checkpoint file
+ * is written on each fsync operation of the translog and records the number of operations written, the current translogs file generation
+ * , it's fsynced offset in bytes and other important statistics.


Remove comma to start line, place at end of previous line.

it's -> its

bytes and -> bytes, and

…_policy

bleskes · 2017-06-01T12:04:35Z

Thx @s1monw , @jasontedor

When we open a translog, we rely on the `translog.ckp` file to tell us what the maximum generation file should be and on the information stored in the last lucene commit to know the first file we need to recover. This requires coordination and is currently subject to a race condition: if a node dies after a lucene commit is made but before we remove the translog generations that were unneeded by it, the next we open the translog we will ignore those files and never delete them (I have added tests for this). This PR changes the approach to have the translog store both of those numbers in the `translog.ckp`. This means it's more self contained and easier to control. This change also decouples the translog recovery logic from the specific commit we're opening. This prepares the ground to fully utilize the deletion policy introduce elastic#24950 and store more translog data that's needed for Lucene, keep multiple lucene commits around, and be free to recover from any of them.

When we open a translog, we rely on the `translog.ckp` file to tell us what the maximum generation file should be and on the information stored in the last lucene commit to know the first file we need to recover. This requires coordination and is currently subject to a race condition: if a node dies after a lucene commit is made but before we remove the translog generations that were unneeded by it, the next time we open the translog we will ignore those files and never delete them (I have added tests for this). This PR changes the approach to have the translog store both of those numbers in the `translog.ckp`. This means it's more self contained and easier to control. This change also decouples the translog recovery logic from the specific commit we're opening. This prepares the ground to fully utilize the deletion policy introduced in #24950 and store more translog data that's needed for Lucene, keep multiple lucene commits around and be free to recover from any of them.

ArielCoralogix · 2018-03-16T04:07:12Z

@bleskes Can you please share the previously hard-coded time interval for translog deletion? The default now (6.2.2) is 12H.
Asking because of this: #29097

bleskes · 2018-03-16T08:45:48Z

@ArielCoralogix previously we'd throw away the translog files immediately after flush. There was no hard coded time based interval. That said - I think it will be very rare this will cause much more files to be open than before. The translog still stays under 512MB as before. This changes doesn't mean we check every 12 hours, we check after each indexing request. The changes means that we keep at most 512MB for at most 12 hours. Effectively - an active translog will always be 512MB rathen then shrinking to 0 and growing again to 512MB. After 12hrs it will be cleaned away. I don't believe you have so many active shards within 12 hrs for this to seriously influence the number of open files. Or do you?

farin99 · 2018-03-16T09:41:43Z

Hey @bleskes, I'm Ariel's colleague. from running: sudo lsof -p
90% of the file descriptors are from translog-xxxxx.tlog files. So if I understand correctly elastic by default won't delete the translog files for 12H or until it reach 512MB?
We have hundreds of active shards

ArielCoralogix · 2018-03-16T09:54:33Z

Hey @bleskes adding to @farin99's comment. We currently have 150,000 open file descriptors on some of our servers (and slowly rising). This ticket has some more info: #29097
Is there any way for us to change the settings so the behavior will be similar to version 5.4? (deleting files immediately after committing)

bleskes added 17 commits May 16, 2017 11:08

wip

da516ff

wip

af64ed2

extract interfaces

bae1a2a

Merge remote-tracking branch 'upstream/master' into translog_deletion…

3abbf9c

…_policy

java doc tweak

5b360dc

Merge remote-tracking branch 'upstream/master' into translog_deletion…

176b848

…_policy

wip

d4500aa

Translog tests compile

f6a69bd

Translog tests pass

b97c69d

remove onTranslogRollover as it's not needed for now

e1a4709

simplification and removal of future stuff

79101c3

update java docs

8380014

tell CombinedDeletionPolicy of open mode so it can be smarter.

00a5ddc

introducing IndexCommitRef

46626a5

a little test to test translog min reference advance

1e8cf62

some java docs

b773ebf

bleskes added :Engine >enhancement v6.0.0 labels May 30, 2017

bleskes requested review from jasontedor and s1monw May 30, 2017 07:05

s1monw approved these changes May 30, 2017

View reviewed changes

bleskes added 2 commits May 30, 2017 11:26

Merge remote-tracking branch 'upstream/master' into translog_deletion…

460673c

…_policy

feedback

b618add

bleskes added 3 commits May 30, 2017 13:53

fix testWithRandomException

765d388

Merge remote-tracking branch 'upstream/master' into translog_deletion…

7656875

…_policy # Conflicts: # core/src/test/java/org/elasticsearch/index/translog/TranslogTests.java

remove unneeded params

97764e5

bleskes added 2 commits June 1, 2017 08:37

newView cleanup

8d12158

Merge remote-tracking branch 'upstream/master' into translog_deletion…

df5f802

…_policy

jasontedor approved these changes Jun 1, 2017

View reviewed changes

bleskes added 2 commits June 1, 2017 13:51

Merge remote-tracking branch 'upstream/master' into translog_deletion…

bcc1a11

…_policy

feedback

362427c

bleskes merged commit 1775e42 into elastic:master Jun 1, 2017

bleskes deleted the translog_deletion_policy branch June 1, 2017 12:04

bleskes mentioned this pull request Jun 1, 2017

Translog file recovery should not rely on lucene commits #25005

Merged

clintongormley added v6.0.0-alpha2 v6.0.0 and removed v6.0.0 v6.0.0-alpha2 labels Jun 6, 2017

colings86 added v6.0.0-beta1 and removed v6.0.0 labels Jul 31, 2017

bleskes mentioned this pull request Oct 1, 2017

Add Sequence Numbers to write operations #10708

Closed

64 tasks

Conversation

bleskes commented May 30, 2017

Uh oh!

s1monw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bleskes May 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bleskes commented May 30, 2017

Uh oh!

bleskes commented Jun 1, 2017

Uh oh!

jasontedor left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bleskes commented Jun 1, 2017

Uh oh!

ArielCoralogix commented Mar 16, 2018

Uh oh!

bleskes commented Mar 16, 2018

Uh oh!

farin99 commented Mar 16, 2018

Uh oh!

ArielCoralogix commented Mar 16, 2018

bleskes May 30, 2017 •

edited

Loading