Skip to content

Deduplicate Index Metadata in BlobStore (#50278)#59514

Merged
original-brownbear merged 1 commit intoelastic:7.xfrom
original-brownbear:50278-7.x
Jul 14, 2020
Merged

Deduplicate Index Metadata in BlobStore (#50278)#59514
original-brownbear merged 1 commit intoelastic:7.xfrom
original-brownbear:50278-7.x

Conversation

@original-brownbear
Copy link
Copy Markdown
Contributor

This PR introduces two new fields in to RepositoryData (index-N) to track the blob name of IndexMetaData blobs and their content via setting generations and uuids. This is used to deduplicate the IndexMetaData blobs (meta-{uuid}.dat in the indices folders under /indices so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot.
This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time.

The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into 7.6).

Relates to #45736 as it improves the efficiency of snapshotting unchanged indices
Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete

backport of #50278

This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot.
This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time.

The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`).

Relates to #45736 as it improves the efficiency of snapshotting unchanged indices
Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete
@original-brownbear original-brownbear added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs backport labels Jul 14, 2020
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team. label Jul 14, 2020
original-brownbear added a commit that referenced this pull request Jul 14, 2020
Disabling BwC tests so that #59514 can be merged.
@original-brownbear original-brownbear merged commit d456f78 into elastic:7.x Jul 14, 2020
@original-brownbear original-brownbear deleted the 50278-7.x branch July 14, 2020 20:18
original-brownbear added a commit that referenced this pull request Jul 14, 2020
Now that #59514 has been merged we can reenable BwC tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants