Skip to content

Speed up Snapshot Finalization#47283

Merged
original-brownbear merged 4 commits intoelastic:masterfrom
original-brownbear:parallelize-sn-finalization
Sep 30, 2019
Merged

Speed up Snapshot Finalization#47283
original-brownbear merged 4 commits intoelastic:masterfrom
original-brownbear:parallelize-sn-finalization

Conversation

@original-brownbear
Copy link
Copy Markdown
Contributor

@original-brownbear original-brownbear commented Sep 30, 2019

As a result of #45689 snapshot finalization started to
take significantly longer than before. This may be a
little unfortunate since it increases the likelihood
of failing to finalize after having written out all
the segment blobs.
This change parallelizes all the metadata writes that
can safely run in parallel in the finalization step to
speed the finalization step up again. Also, this will
generally speed up the snapshot process overall in case
of large number of indices.

This is also a nice to have for #46250 since we add yet
another step (deleting of old index- blobs in the shards
to the finalization.

As a result of #45689 snapshot finalization started to
take significantly longer than before. This may be a
little unfortunate since it increases the likelihood
of failing to finalize after having written out all
the segment blobs.
This change parallelizes all the metadata writes that
can safely run in parallel in the finalization step to
speed the finalization step up again. Also, this will
generally speed up the snapshot process overall in case
of large number of indices.
@original-brownbear original-brownbear added >non-issue :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.5.0 labels Sep 30, 2019
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Jenkins run elasticsearch-ci/bwc

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Jenkins run elasticsearch-ci/bwc

final RepositoryData updatedRepositoryData = getRepositoryData().addSnapshot(snapshotId, blobStoreSnapshot.state(), indices);
snapshotFormat.write(blobStoreSnapshot, blobContainer(), snapshotId.getUUID(), false);
writeIndexGen(updatedRepositoryData, repositoryStateId);
} catch (FileAlreadyExistsException ex) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catch is gone now, it was dead code because we don't do the exists check for this blob anymore in the line above where we write the snap- blob.

indexMetaDataFormat.write(clusterMetaData.index(index.getName()), indexContainer(index), snapshotId.getUUID(), false);
}
} catch (IOException ex) {
throw new SnapshotException(metadata.name(), snapshotId, "failed to write metadata for snapshot", ex);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this specific rethrow because we write the index meta in parallel to the root level snap- blob with this change anyway so throwing with a separate message here seemed pointless.


public class MockEventuallyConsistentRepositoryTests extends ESTestCase {

private Environment environment;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just dead-code. Saw it when making adjustments here and just removed it when because I figured it wasn't worth a separate PR.

Copy link
Copy Markdown
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, nothing to worry about as it looks great already

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Thanks @tlrx , all points addressed I think :)

Copy link
Copy Markdown
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice change

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Jenkins run elasticsearch-ci/packaging-sample

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Thanks Tanguy!

@original-brownbear original-brownbear merged commit 5405f2e into elastic:master Sep 30, 2019
@original-brownbear original-brownbear deleted the parallelize-sn-finalization branch September 30, 2019 15:54
original-brownbear added a commit that referenced this pull request Sep 30, 2019
As a result of #45689 snapshot finalization started to
take significantly longer than before. This may be a
little unfortunate since it increases the likelihood
of failing to finalize after having written out all
the segment blobs.
This change parallelizes all the metadata writes that
can safely run in parallel in the finalization step to
speed the finalization step up again. Also, this will
generally speed up the snapshot process overall in case
of large number of indices.

This is also a nice to have for #46250 since we add yet
another step (deleting of old index- blobs in the shards
to the finalization.
mkleen added a commit to crate/crate that referenced this pull request Nov 27, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is
integrated.
mkleen added a commit to crate/crate that referenced this pull request Nov 27, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.
mkleen added a commit to crate/crate that referenced this pull request Nov 27, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.
mkleen added a commit to crate/crate that referenced this pull request Nov 27, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.
mkleen added a commit to crate/crate that referenced this pull request Nov 27, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.
mkleen added a commit to crate/crate that referenced this pull request Nov 27, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.
mergify bot pushed a commit to crate/crate that referenced this pull request Nov 27, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.
mergify bot pushed a commit to crate/crate that referenced this pull request Nov 27, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.

(cherry picked from commit 3091e26)
mkleen added a commit to crate/crate that referenced this pull request Nov 28, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.

(cherry picked from commit 3091e26)
mergify bot pushed a commit to crate/crate that referenced this pull request Nov 28, 2019
This pull request is a backport of
elastic/elasticsearch#47283

The purpose of this pull request is to speed up the snapshot
finalization. This is archived by parallelizing the writes of the
metadata in the snapshot finalization step. Also, this will
generally speed up the snapshot process overall in case of large
number of indices.

This improvement makes sense, because the snapshot finalization
takes much longer since #9327 is integrated.

(cherry picked from commit 3091e26)
@original-brownbear original-brownbear restored the parallelize-sn-finalization branch January 6, 2021 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue v7.5.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants