Use lastSyncedGlobalCheckpoint in deletion policy by dnhatn · Pull Request #27826 · elastic/elasticsearch

dnhatn · 2017-12-14T18:59:21Z

Today we use the in-memory global checkpoint from SequenceNumbersService
to clean up unneeded commit points, however the latest global checkpoint
may haven't fsynced to the disk yet. If the translog checkpoint fsync
failed and we already use a higher global checkpoint to clean up commit
points, then we may have removed a safe commit which we try to keep for
recovery.

This commit updates the deletion policy using lastSyncedGlobalCheckpoint
from Translog rather the in memory global checkpoint.

Relates #27606

Today we use the in-memory global checkpoint from SequenceNumbersService to clean up unneeded commit points, however the latest global checkpoint may haven't fsynced to the disk yet. If the translog checkpoint fsync failed and we already use a higher global checkpoint to clean up commit points, then we may have removed a safe commit which we try to keep for recovery. This commit updates the deletion policy using lastSyncedGlobalCheckpoint from Translog rather the in memory global checkpoint.

bleskes

Thx Nhat. I left some comments

bleskes · 2017-12-15T12:02:28Z

core/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java

        Channels.readFromFileChannelWithEofException(channel, position, targetBuffer);
    }

-    private static Checkpoint writeCheckpoint(


why is this relevant? I rather not touch files that aren't relevant for the PR (feel free to push such a change as removal of dead code without a PR)

bleskes · 2017-12-15T12:05:09Z

core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

-                    globalCheckpoint.set(randomIntBetween(
-                        Math.toIntExact(engine.seqNoService().getGlobalCheckpoint()),
-                        Math.toIntExact(engine.seqNoService().getLocalCheckpoint())));
+        engine = new InternalEngine(config(indexSettings, store, createTempDir(), NoMergePolicy.INSTANCE, null), seqNoServiceSupplier) {


why did we lose the try with resources clause?

We close the engine in tearDown but I put the try-clause back.

I see - we reuse the same engine variable. I think it's cleaner to have it contained.

bleskes · 2017-12-15T12:11:50Z

core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

+                engine.getTranslog().sync();
+            }
+            if (frequently()) {
+                final long lastSyncedGlobalCheckpoint = engine.getTranslog().getLastSyncedGlobalCheckpoint();


strictly speaking I think we need to read this from disk after the flush - i.e., make sure that what's on disk is OK.

bleskes · 2017-12-15T12:18:42Z

core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

+            @Override
+            protected void commitIndexWriter(IndexWriter writer, Translog translog, String syncId) throws IOException {
+                // The global checkpoint is advanced but not fsynced yet.
+                final long lagging = seqNoService().getLocalCheckpoint() - seqNoService().getGlobalCheckpoint();


I don't follow this - why do we need to query the global checkpoint from the seqNoService? we alread have access to globalCheckpoint ?

I think what you mean here is something like this - no if no rarely:

// Advance the global checkpoint during the flush to create a lag between what's persisted in the translog (and is visible for CombinedDeletionPolicy) and what's in memory in the SequenceServices globalCheckpoint.set(randomLongBetween(globalCheckpoint.get(), seqNoService().getLocalCheckpoint()));

wdyt?

dnhatn · 2017-12-15T15:05:50Z

@bleskes I have addressed your comments. Would you please take a look? Thank you.

bleskes

LGTM. Thx @dnhatn

dnhatn · 2017-12-16T16:03:02Z

Thanks @bleskes.

Today we use the in-memory global checkpoint from SequenceNumbersService to clean up unneeded commit points, however the latest global checkpoint may haven't fsynced to the disk yet. If the translog checkpoint fsync failed and we already use a higher global checkpoint to clean up commit points, then we may have removed a safe commit which we try to keep for recovery. This commit updates the deletion policy using lastSyncedGlobalCheckpoint from Translog rather the in memory global checkpoint. Relates elastic#27606

) Today we use the in-memory global checkpoint from SequenceNumbersService to clean up unneeded commit points, however the latest global checkpoint may haven't fsynced to the disk yet. If the translog checkpoint fsync failed and we already use a higher global checkpoint to clean up commit points, then we may have removed a safe commit which we try to keep for recovery. This commit updates the deletion policy using lastSyncedGlobalCheckpoint from Translog rather the in memory global checkpoint. This is a backport of #27826.

dnhatn · 2017-12-19T21:48:46Z

Backported in #27866

dnhatn added :Sequence IDs >enhancement v6.2.0 v7.0.0 labels Dec 14, 2017

dnhatn requested review from bleskes and jasontedor December 14, 2017 18:59

dnhatn added the review label Dec 14, 2017

update test with getLastSyncedGlobalCheckpoint

47fe954

bleskes suggested changes Dec 15, 2017

View reviewed changes

dnhatn added 4 commits December 15, 2017 09:21

revert unrelated code

608a20b

apply feedback

d22ff6a

Merge branch 'master' into translog-gcp

a8d9b67

try-clause

fc5b86f

bleskes approved these changes Dec 16, 2017

View reviewed changes

dnhatn merged commit 4f62b51 into elastic:master Dec 16, 2017

dnhatn deleted the translog-gcp branch December 16, 2017 16:03

dnhatn added the backport pending label Dec 16, 2017

ywelsch mentioned this pull request Dec 18, 2017

Move GlobalCheckpointTracker and remove SequenceNumbersService #27837

Merged

dnhatn mentioned this pull request Dec 18, 2017

Backport for using lastSyncedGlobalCheckpoint in deletion policy #27866

Merged

dnhatn removed the backport pending label Dec 19, 2017

clintongormley added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Sequence IDs labels Feb 14, 2018

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use lastSyncedGlobalCheckpoint in deletion policy#27826

Use lastSyncedGlobalCheckpoint in deletion policy#27826
dnhatn merged 6 commits intoelastic:masterfrom
dnhatn:translog-gcp

dnhatn commented Dec 14, 2017

Uh oh!

bleskes left a comment

Uh oh!

bleskes Dec 15, 2017

Uh oh!

bleskes Dec 15, 2017

Uh oh!

dnhatn Dec 15, 2017

Uh oh!

bleskes Dec 15, 2017

Uh oh!

bleskes Dec 15, 2017

Uh oh!

bleskes Dec 15, 2017

Uh oh!

dnhatn commented Dec 15, 2017

Uh oh!

bleskes left a comment

Uh oh!

dnhatn commented Dec 16, 2017

Uh oh!

dnhatn commented Dec 19, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dnhatn commented Dec 14, 2017

Uh oh!

bleskes left a comment

Choose a reason for hiding this comment

Uh oh!

bleskes Dec 15, 2017

Choose a reason for hiding this comment

Uh oh!

bleskes Dec 15, 2017

Choose a reason for hiding this comment

Uh oh!

dnhatn Dec 15, 2017

Choose a reason for hiding this comment

Uh oh!

bleskes Dec 15, 2017

Choose a reason for hiding this comment

Uh oh!

bleskes Dec 15, 2017

Choose a reason for hiding this comment

Uh oh!

bleskes Dec 15, 2017

Choose a reason for hiding this comment

Uh oh!

dnhatn commented Dec 15, 2017

Uh oh!

bleskes left a comment

Choose a reason for hiding this comment

Uh oh!

dnhatn commented Dec 16, 2017

Uh oh!

dnhatn commented Dec 19, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants