Skip to content

Trim translog for closed indices#43156

Merged
tlrx merged 6 commits intoelastic:masterfrom
tlrx:trim-translog-for-closed-indices
Jun 28, 2019
Merged

Trim translog for closed indices#43156
tlrx merged 6 commits intoelastic:masterfrom
tlrx:trim-translog-for-closed-indices

Conversation

@tlrx
Copy link
Copy Markdown
Member

@tlrx tlrx commented Jun 12, 2019

Today when an index is closed all its shards are forced flushed but the translog files are left around. As explained in #42445 we'd like to trim the translog for closed indices in order to consume less disk space.

Instead of trimming the translog at closing time (which can be challenging as explained in #42445) or during the initialization of the noop engine, this pull request proposes to reuse the existing AsyncTrimTranslogTask task and to reenable it for closed indices.

At the time the task is executed, we should have the guarantee that nothing holds the translog files that are going to be removed. It also leaves a short period of time (10 min) during which translog files of a recently closed index are still present on disk. This could also help in some cases where the closed index is reopened shortly after being closed (in order to update an index setting for example).

@tlrx tlrx added >enhancement :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.3.0 labels Jun 12, 2019
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed

@tlrx tlrx requested review from dnhatn and ywelsch June 12, 2019 12:42
@dnhatn
Copy link
Copy Markdown
Member

dnhatn commented Jun 13, 2019

I think we should not trim translog in NoOpEngine because it extends from ReadOnlyEngine which should not modify translog (and Lucene index). How about trimming translog in InternalEngine when we execute the verify-before-close step (after we flush)?

@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Jun 19, 2019

We talked about this today and we agreed that we can move forward with the proposed solution, as long as the translog trimming is correctly documented in NoOpEngine. This solution will help to curate translog files of indices closed in 7.2.

We also agreed on trimming translogs files more aggressively when indices are verified before being closed, as suggested by @dnhatn and others, but that would require peer recoveries to not use translog at all anymore. This will become possible in a short future once peer recovery retention leases will be implemented (see #41536).

Copy link
Copy Markdown
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Thanks @tlrx. I left a comment.

@tlrx tlrx requested a review from dnhatn June 25, 2019 14:38
Copy link
Copy Markdown
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a smaller comment but LGTM. Thanks @tlrx.

Copy link
Copy Markdown
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you follow-up with a change that exposes correct TranslogStats for NoOpEngine, and reloads them after the trimming?

try (ReleasableLock lock = readLock.acquire()) {
ensureOpen();
final List<IndexCommit> commits = DirectoryReader.listCommits(store.directory());
if (commits.size() == 1) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we guaranteed to have this on a successful close? How do we know that previous commits have been cleaned up on verify-before-close?

@tlrx tlrx merged commit c900795 into elastic:master Jun 28, 2019
@tlrx tlrx deleted the trim-translog-for-closed-indices branch June 28, 2019 14:18
tlrx added a commit that referenced this pull request Jun 28, 2019
Today when an index is closed all its shards are forced flushed
but the translog files are left around. As explained in #42445
we'd like to trim the translog for closed indices in order to
consume less disk space. This commit reuses the existing
AsyncTrimTranslogTask task and reenables it for closed indices.

At the time the task is executed, we should have the guarantee
that nothing holds the translog files that are going to be removed.
It also leaves a short period of time (10 min) during which translog
files of a recently closed index are still present on disk. This could
 also help in some cases where the closed index is reopened
shortly after being closed (in order to update an index setting
for example).

Relates to #42445
@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Jun 28, 2019

Thanks @dnhatn and @ywelsch

Can you follow-up with a change that exposes correct TranslogStats for NoOpEngine, and reloads them after the trimming?

I opened #43752 to expose translog stats for read only engines. Once this one is merged I'll open a follow to reload stats after the trimming.

tlrx added a commit that referenced this pull request Jul 3, 2019
This commit changes NoOpEngine so that it refreshes its translog 
stats once translog is trimmed.

Relates #43156
@tlrx tlrx mentioned this pull request Jul 3, 2019
@jakelandis jakelandis removed the v8.0.0 label Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v7.3.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants