Speed up Snapshot Finalization#47283
Speed up Snapshot Finalization#47283original-brownbear merged 4 commits intoelastic:masterfrom original-brownbear:parallelize-sn-finalization
Conversation
As a result of #45689 snapshot finalization started to take significantly longer than before. This may be a little unfortunate since it increases the likelihood of failing to finalize after having written out all the segment blobs. This change parallelizes all the metadata writes that can safely run in parallel in the finalization step to speed the finalization step up again. Also, this will generally speed up the snapshot process overall in case of large number of indices.
|
Pinging @elastic/es-distributed |
|
Jenkins run elasticsearch-ci/bwc |
|
Jenkins run elasticsearch-ci/bwc |
| final RepositoryData updatedRepositoryData = getRepositoryData().addSnapshot(snapshotId, blobStoreSnapshot.state(), indices); | ||
| snapshotFormat.write(blobStoreSnapshot, blobContainer(), snapshotId.getUUID(), false); | ||
| writeIndexGen(updatedRepositoryData, repositoryStateId); | ||
| } catch (FileAlreadyExistsException ex) { |
There was a problem hiding this comment.
This catch is gone now, it was dead code because we don't do the exists check for this blob anymore in the line above where we write the snap- blob.
| indexMetaDataFormat.write(clusterMetaData.index(index.getName()), indexContainer(index), snapshotId.getUUID(), false); | ||
| } | ||
| } catch (IOException ex) { | ||
| throw new SnapshotException(metadata.name(), snapshotId, "failed to write metadata for snapshot", ex); |
There was a problem hiding this comment.
I removed this specific rethrow because we write the index meta in parallel to the root level snap- blob with this change anyway so throwing with a separate message here seemed pointless.
|
|
||
| public class MockEventuallyConsistentRepositoryTests extends ESTestCase { | ||
|
|
||
| private Environment environment; |
There was a problem hiding this comment.
This is just dead-code. Saw it when making adjustments here and just removed it when because I figured it wasn't worth a separate PR.
tlrx
left a comment
There was a problem hiding this comment.
I left some comments, nothing to worry about as it looks great already
server/src/main/java/org/elasticsearch/repositories/Repository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/core/src/main/java/org/elasticsearch/snapshots/SourceOnlySnapshotRepository.java
Show resolved
Hide resolved
|
Thanks @tlrx , all points addressed I think :) |
|
Jenkins run elasticsearch-ci/packaging-sample |
|
Thanks Tanguy! |
As a result of #45689 snapshot finalization started to take significantly longer than before. This may be a little unfortunate since it increases the likelihood of failing to finalize after having written out all the segment blobs. This change parallelizes all the metadata writes that can safely run in parallel in the finalization step to speed the finalization step up again. Also, this will generally speed up the snapshot process overall in case of large number of indices. This is also a nice to have for #46250 since we add yet another step (deleting of old index- blobs in the shards to the finalization.
This pull request is a backport of elastic/elasticsearch#47283 The purpose of this pull request is to speed up the snapshot finalization. This is archived by parallelizing the writes of the metadata in the snapshot finalization step. Also, this will generally speed up the snapshot process overall in case of large number of indices. This improvement makes sense, because the snapshot finalization takes much longer since #9327 is integrated.
This pull request is a backport of elastic/elasticsearch#47283 The purpose of this pull request is to speed up the snapshot finalization. This is archived by parallelizing the writes of the metadata in the snapshot finalization step. Also, this will generally speed up the snapshot process overall in case of large number of indices. This improvement makes sense, because the snapshot finalization takes much longer since #9327 is integrated.
This pull request is a backport of elastic/elasticsearch#47283 The purpose of this pull request is to speed up the snapshot finalization. This is archived by parallelizing the writes of the metadata in the snapshot finalization step. Also, this will generally speed up the snapshot process overall in case of large number of indices. This improvement makes sense, because the snapshot finalization takes much longer since #9327 is integrated.
This pull request is a backport of elastic/elasticsearch#47283 The purpose of this pull request is to speed up the snapshot finalization. This is archived by parallelizing the writes of the metadata in the snapshot finalization step. Also, this will generally speed up the snapshot process overall in case of large number of indices. This improvement makes sense, because the snapshot finalization takes much longer since #9327 is integrated.
This pull request is a backport of elastic/elasticsearch#47283 The purpose of this pull request is to speed up the snapshot finalization. This is archived by parallelizing the writes of the metadata in the snapshot finalization step. Also, this will generally speed up the snapshot process overall in case of large number of indices. This improvement makes sense, because the snapshot finalization takes much longer since #9327 is integrated.
This pull request is a backport of elastic/elasticsearch#47283 The purpose of this pull request is to speed up the snapshot finalization. This is archived by parallelizing the writes of the metadata in the snapshot finalization step. Also, this will generally speed up the snapshot process overall in case of large number of indices. This improvement makes sense, because the snapshot finalization takes much longer since #9327 is integrated.
This pull request is a backport of elastic/elasticsearch#47283 The purpose of this pull request is to speed up the snapshot finalization. This is archived by parallelizing the writes of the metadata in the snapshot finalization step. Also, this will generally speed up the snapshot process overall in case of large number of indices. This improvement makes sense, because the snapshot finalization takes much longer since #9327 is integrated.
This pull request is a backport of elastic/elasticsearch#47283 The purpose of this pull request is to speed up the snapshot finalization. This is archived by parallelizing the writes of the metadata in the snapshot finalization step. Also, this will generally speed up the snapshot process overall in case of large number of indices. This improvement makes sense, because the snapshot finalization takes much longer since #9327 is integrated. (cherry picked from commit 3091e26)
This pull request is a backport of elastic/elasticsearch#47283 The purpose of this pull request is to speed up the snapshot finalization. This is archived by parallelizing the writes of the metadata in the snapshot finalization step. Also, this will generally speed up the snapshot process overall in case of large number of indices. This improvement makes sense, because the snapshot finalization takes much longer since #9327 is integrated. (cherry picked from commit 3091e26)
This pull request is a backport of elastic/elasticsearch#47283 The purpose of this pull request is to speed up the snapshot finalization. This is archived by parallelizing the writes of the metadata in the snapshot finalization step. Also, this will generally speed up the snapshot process overall in case of large number of indices. This improvement makes sense, because the snapshot finalization takes much longer since #9327 is integrated. (cherry picked from commit 3091e26)
As a result of #45689 snapshot finalization started to
take significantly longer than before. This may be a
little unfortunate since it increases the likelihood
of failing to finalize after having written out all
the segment blobs.
This change parallelizes all the metadata writes that
can safely run in parallel in the finalization step to
speed the finalization step up again. Also, this will
generally speed up the snapshot process overall in case
of large number of indices.
This is also a nice to have for #46250 since we add yet
another step (deleting of old
index-blobs in the shardsto the finalization.