Skip to content

Only link fd* files during source-only snapshot#53463

Merged
ywelsch merged 6 commits intoelastic:masterfrom
ywelsch:source-only-linked-files
Mar 23, 2020
Merged

Only link fd* files during source-only snapshot#53463
ywelsch merged 6 commits intoelastic:masterfrom
ywelsch:source-only-linked-files

Conversation

@ywelsch
Copy link
Copy Markdown
Contributor

@ywelsch ywelsch commented Mar 12, 2020

Source-only snapshots currently create a second full source-only copy of the shard on disk to support incrementality during upload. Given that stored fields are occupying a substantial part of a shard's storage, this means that clusters with source-only snapshots can require up to 50% more local storage. Ideally we would only generate source-only parts of the shard for the things that need to be uploaded (i.e. do incrementality checks on original file instead of trimmed-down source-only versions), but that requires much bigger changes to the snapshot infrastructure. This here is an attempt to dramatically cut down on the storage used by the source-only copy of the shard by soft-linking the stored-fields files (fd*) instead of copying them.

Relates #50231

@ywelsch ywelsch added >enhancement :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.7.0 labels Mar 12, 2020
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@original-brownbear
Copy link
Copy Markdown
Contributor

@ywelsch FYI, checkstyle randomness:

[ant:checkstyle] [ERROR] /dev/shm/elastic+elasticsearch+pull-request-2/x-pack/plugin/core/src/main/java/org/elasticsearch/snapshots/SourceOnlySnapshot.java:218: Line is longer than 140 characters (found 143). [LineLength]
[ant:checkstyle] [ERROR] /dev/shm/elastic+elasticsearch+pull-request-2/x-pack/plugin/core/src/main/java/org/elasticsearch/snapshots/SourceOnlySnapshotRepository.java:141: Line is longer than 140 characters (found 142). [LineLength]

Copy link
Copy Markdown
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks @ywelsch LGTM :) just some random+optional NITS

@ywelsch ywelsch merged commit c829079 into elastic:master Mar 23, 2020
ywelsch added a commit that referenced this pull request Mar 23, 2020
Source-only snapshots currently create a second full source-only copy of the shard on disk to
support incrementality during upload. Given that stored fields are occupying a substantial part
of a shard's storage, this means that clusters with source-only snapshots can require up to
50% more local storage. Ideally we would only generate source-only parts of the shard for the
things that need to be uploaded (i.e. do incrementality checks on original file instead of
trimmed-down source-only versions), but that requires much bigger changes to the snapshot
infrastructure. This here is an attempt to dramatically cut down on the storage used by the
source-only copy of the shard by soft-linking the stored-fields files (fd*) instead of copying
them.

Relates #50231
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement v7.7.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants