Resumable prune & memory-efficient index rewrite#4812
Merged
MichaelEischer merged 25 commits intorestic:masterfrom May 24, 2024
Merged
Resumable prune & memory-efficient index rewrite#4812MichaelEischer merged 25 commits intorestic:masterfrom
MichaelEischer merged 25 commits intorestic:masterfrom
Conversation
f10557b to
17ac72b
Compare
7 tasks
17ac72b to
986d7ae
Compare
All methods should use blobType followed by ID.
The method now uses the same parameters as LookupBlobSize.
The method is now only indirectly accessible via Prune or RepairIndex.
Using the field with its current semantics is nearly impossible to get right. Remove it as it will be replaced anyways in repository format 3.
this allows prune to resume an interrupted prune run.
Rewrite implements a streaming rewrite of the index that excludes the given packs. For this it loads all index files from the repository and only modifies those that require changes. This will reduce the index churn when running prune. Rewrite does not require the in-memory index and thus can drop it to significantly reduce the memory usage. However, `prune --unsafe-recovery` cannot use this strategy and requires a separate method to save the whole in-memory index. This is now handled using SaveFallback.
The index operations are likely CPU-bounded. Thus, reduce the concurrency accordingly.
Pack files created by interrupted prune runs, appear to consist only of duplicate blobs on the next run. This caused the previous heuristic to ignore those pack files. Now, a duplicate blob in a specific pack file is also selected if that pack file only contains duplicate blobs. This allows prune to select the already rewritten pack files.
986d7ae to
860b595
Compare
This was referenced May 25, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR change? What problem does it solve?
The PR consists of a refactoring of the index and reworks the index rewrite to be much more memory efficient.
The refactoring makes a few noteworthy changes:
repo.Index().Save(...)method.MasterIndex.Load(...)supersedesfield. It was never really used as using it correctly is nearly impossible.Prune no longer uses
repository.DisableAutoIndexUpdate(). This allows efficiently resuming interrupted prune runs.To properly select packs from previous interrupted prune runs, the prune heuristics required a fix. Pack files created by interrupted prune runs, appear to consist only of duplicate blobs on the next run. This caused the previous heuristic to ignore those pack files. Now, a duplicate blob in a specific pack file is also selected if that pack file only contains duplicate blobs. This allows prune to select the already rewritten pack files.
MasterIndex.Rewriteimplements a streaming rewrite of the index that excludes the given packs. For this it loads all index files from the repository and only modifies those that require changes. This will reduce the index churn when running prune. Rewrite does not require the in-memory index and thus can drop it to significantly reduce the memory usage.However,
prune --unsafe-recoverycannot use this strategy and requires a separate method to save the whole in-memory index. This is now handled usingMasterIndex.SaveFallback.Was the change previously discussed in an issue or on the forum?
Fixes #3806
The streaming rewrite is part of the roadmap for restic 0.17.0
Checklist
[ ] I have added documentation for relevant changes (in the manual).changelog/unreleased/that describes the changes for our users (see template).gofmton the code in all commits.