Deduplicate Index Metadata in BlobStore (#50278) by original-brownbear · Pull Request #59514 · elastic/elasticsearch

original-brownbear · 2020-07-14T10:03:41Z

This PR introduces two new fields in to RepositoryData (index-N) to track the blob name of IndexMetaData blobs and their content via setting generations and uuids. This is used to deduplicate the IndexMetaData blobs (meta-{uuid}.dat in the indices folders under /indices so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot.
This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time.

The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into 7.6).

Relates to #45736 as it improves the efficiency of snapshotting unchanged indices
Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete

backport of #50278

This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot. This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time. The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`). Relates to #45736 as it improves the efficiency of snapshotting unchanged indices Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete

elasticmachine · 2020-07-14T10:03:43Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

Disabling BwC tests so that #59514 can be merged.

Now that #59514 has been merged we can reenable BwC tests.

original-brownbear added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs backport labels Jul 14, 2020

elasticmachine added the Team:Distributed Meta label for distributed team. label Jul 14, 2020

original-brownbear mentioned this pull request Jul 14, 2020

Disable BwC Tests for #59514 #59565

Merged

original-brownbear added a commit that referenced this pull request Jul 14, 2020

Disable BwC Tests for #59514 (#59565)

296fee1

Disabling BwC tests so that #59514 can be merged.

original-brownbear merged commit d456f78 into elastic:7.x Jul 14, 2020

original-brownbear deleted the 50278-7.x branch July 14, 2020 20:18

original-brownbear mentioned this pull request Jul 14, 2020

Reenable BwC Tests after Merging #59514 #59568

Merged

original-brownbear added a commit that referenced this pull request Jul 14, 2020

Reenable BwC Tests after Merging #59514 (#59568)

fb2e3f8

Now that #59514 has been merged we can reenable BwC tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deduplicate Index Metadata in BlobStore (#50278)#59514

Deduplicate Index Metadata in BlobStore (#50278)#59514
original-brownbear merged 1 commit intoelastic:7.xfrom
original-brownbear:50278-7.x

original-brownbear commented Jul 14, 2020

Uh oh!

elasticmachine commented Jul 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

original-brownbear commented Jul 14, 2020

Uh oh!

elasticmachine commented Jul 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants