Improve snapshot creation and deletion performance on repositories with large number of snapshots#8969
Conversation
39a6e57 to
ee3c3d1
Compare
There was a problem hiding this comment.
Maybe use the method readSnapshot(stream) again?
There was a problem hiding this comment.
I cannot reuse it here because this method reads multiple snapshots. But I can definitely add a new method readSnapshots() similiar to readSnapshot() and use it here.
|
LGTM, left some comments. |
|
@tlrx Thanks! I have updated the PR with your suggestions. |
|
LGTM, just a typo in documentation. |
There was a problem hiding this comment.
Do you need to close the stream after reading it here?
There was a problem hiding this comment.
It's responsibility of whoever opened it (caller).
6310866 to
f138b6b
Compare
f138b6b to
abb7193
Compare
There was a problem hiding this comment.
Can use the '{}' syntax here for DATA_BLOB_PREFIX
|
@imotov left some more comments on this |
There was a problem hiding this comment.
This is fallback right? For older repos that don't have the new snapshot index file?
There was a problem hiding this comment.
Correct. Added a comment.
|
@imotov left some more minor comments, mostly around documentation, exceptions, and renaming a var |
|
@dakrone thanks! pushed an update. |
|
Thanks @imotov, LGTM! |
…th large number of snapshots Each shard repository consists of snapshot file for each snapshot - this file contains a map between original physical file that is snapshotted and its representation in repository. This data includes original filename, checksum and length. When a new snapshot is created, elasticsearch needs to read all these snapshot files to figure which file are already present in the repository and which files still have to be copied there. This change adds a new index file that contains all this information combined into a single file. So, if a repository has 1000 snapshots with 1000 shards elasticsearch will only need to read 1000 blobs (one per shard) instead of 1,000,000 to delete a snapshot. This change should also improve snapshot creation speed on repositories with large number of snapshot and high latency. Fixes elastic#8958
0e8834e to
59d9f7e
Compare
|
Will this be back ported to 1.x? |
|
@niemyjski this was a significant change that required changing the snapshot file format and it was too big of a change for a patch level release. So we didn't port to 1.x and there are no current plans to do it. |
Fixes #8958