restorer: Preallocate files#2893
Merged
MichaelEischer merged 3 commits intorestic:masterfrom Sep 8, 2020
Merged
Conversation
94e5322 to
23bd285
Compare
Member
Author
|
@ifedorenko could you take a look at this PR? |
Contributor
|
sorry for the slow response... will try to find time to review this next week |
ifedorenko
reviewed
Aug 29, 2020
25fb1b2 to
e5dd6bf
Compare
b9f39d1 to
d0264b0
Compare
ifedorenko
reviewed
Sep 7, 2020
d0264b0 to
121233e
Compare
ifedorenko
approved these changes
Sep 8, 2020
Contributor
|
Really great work by you guys. Thanks so much. |
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR change? What problem does it solve?
The new restorer implementation writes data blobs in a random order to the destinations. On filesystems which implicitly create sparse files (probably on most unixoid systems) when writing to a location somewhere after the file end this can lead to a very large amount of file fragments. Even without sparse files (e.g. the default on Windows) the file will grow in multiple increments which can cause fragmentation.
The restore loop itself now queues pack files for download in to order of their first reference in one of the restored files. For a large file this should usually mean that the pack download order closely resembles the order in which data blobs are referenced. That way the access pattern more closely resembles that of a file which grows over time.
In addition, after a file is created, the fileswriter will issue a call to preallocate the full file size. This allows the filesystem to allocate space with as little fragmentation as possible.
There's currently a preallocate implementation for Linux and macOS. For Windows the generic implementation with
TruncatecallsSetEndOfFilewhich is supposed to preallocate disk space. On Linux the file system is able to track not yet initialized file extents, whereas on Windows there's only an offset up to which the file content was initialized. This means that a write into the middle of a file has to zero everything up to then.The preallocate implementation is currently located in
internal/restorerwhich is probably not the ideal place, butinternal/resticis already overcrowded. There's also a test ininternal/restorer/preallocate_test.go, which tests something, but I'm not really sure how fragile it is.Was the change discussed in an issue or in the forum before?
Closes #2675
Checklist
[ ] I have added documentation for the changes (in the manual)[ ] There's a new file inchangelog/unreleased/that describes the changes for our users (template here)gofmton the code in all commits