Skip to content

Shorten time that snapshot finalization blocks repo#89572

Merged
original-brownbear merged 1 commit intoelastic:mainfrom
original-brownbear:faster-snapshot-finalization
Aug 25, 2022
Merged

Shorten time that snapshot finalization blocks repo#89572
original-brownbear merged 1 commit intoelastic:mainfrom
original-brownbear:faster-snapshot-finalization

Conversation

@original-brownbear
Copy link
Copy Markdown
Contributor

For large snapshot clusters the final cleanup step that removes
previous shard metadata might have to delete tens of thousands of blobs.
This can take many seconds, for which the repository is needlessly blocked
from running subsequent finalizations or deletes.
Similar to how the final cleanup step was made async and taken out of the
finalization loop for deletes in #86514 this PR moves the final
cleanup step for finalization async.

This produces a measurable speedup in the many-shards snapshotting benchmark run of a couple minutes
saved end-to-end for creating 100 snapshots in a 25k indices cluster.

relates #77466

For large snapshot clusters the final cleanup step that removes
previous shard metadata might have to delete tens of thousands of blobs.
This can take many seconds, for which the repository is needlessly blocked
from running subsequent finalizations or deletes.
Similar to how the final cleanup step was made async and taken out of the
finalization loop for deletes in #86514 this PR moves the final
cleanup step for finalization async.
@original-brownbear original-brownbear added >non-issue :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.5.0 labels Aug 24, 2022
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Aug 24, 2022
Copy link
Copy Markdown
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I left a question, just to ensure that I understand the change correctly.

} catch (Exception e) {
logger.warn("Failed to clean up old shard generation blobs", e);
if (toDelete.isEmpty() == false) {
threadPool.executor(ThreadPool.Names.SNAPSHOT).execute(() -> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we compute the list of files to delete based on the old/new repository data, would it be possible for two delete operations to try and delete an intersection of files? I guess that will be fine, the only minor issue I can see is that we might end up having more queued tasks in the SNAPSHOT thread pool, but the risk should be minimal?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible for two delete operations to try and delete an intersection of files?

No this should not happen since we compute the files to delete from the repo data diff. So if a file A goes out of scope when moving from generation 1 to 2, then it will never show up in generation 2 and thus won't be in the diff from 2 to 3.

I guess that will be fine, the only minor issue I can see is that we might end up having more queued tasks in the SNAPSHOT thread pool, but the risk should be minimal?

This is actually something that happens in the many-shards snapshot benchmark. Assuming you have back to back snapshot finalizations, it's really a function of how fast cluster state updates are vs how fast those deletes are. But it's not really an issue IMO, we're still only talking about O(10s) tasks here in the worst cases (GCP + 50k indices).

Copy link
Copy Markdown
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the clarifications!

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Thanks Francisco!

@original-brownbear original-brownbear merged commit bdbfcb3 into elastic:main Aug 25, 2022
@original-brownbear original-brownbear deleted the faster-snapshot-finalization branch August 25, 2022 08:09
}

public void onDone(SnapshotInfo snapshotInfo) {
onDone.accept(snapshotInfo);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert that onResponse() has been called at this stage? I also wonder if renaming the method to something like afterCleanUp() would be clearer?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert that onResponse() has been called at this stage?

I wanted to do that, but this isn't actually 100% true for the S3 waiting implementation it turns out ... which is kind of stupid, I'll open a discussion about removing it :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the precision

} catch (Exception e) {
logger.warn("Failed to clean up old shard generation blobs", e);
} finally {
finalizeSnapshotContext.onDone(snapshotInfo);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to pass the SnapshotInfo instance back to where it is originated from? 🤔

Suggested change
finalizeSnapshotContext.onDone(snapshotInfo);
finalizeSnapshotContext.afterCleanUp();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that is a good point, I just realized we don't need to pass it back here indeed ... we used to need to do that because we computed SnapshotInfo in the repository but that's not the case anymore. I'll follow up with a simplification PR, thanks for spotting this!

original-brownbear added a commit that referenced this pull request Aug 26, 2022
This fixes a minor bug introduced in #89572. We need to take the listeners
for the snapshot out of the listeners map for the repo right away before
trying to run the next repo operation like we did prior to that change.
Otherwise we run the risk of a fatal exception failing all operations for
the repo like a no-longer-master exception did in #89625 even though
the snapshot completed successfully.

closes #89625
@original-brownbear original-brownbear restored the faster-snapshot-finalization branch April 18, 2023 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue Team:Distributed Meta label for distributed team. v8.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants