index: speed up masterindex merging#5713
Merged
Merged
Conversation
`b.Loop()` drastically shortens benchmark execution times for tests with an expensive initialization phase as it only has to happen once now.
…blobs Only in use on 64-bit systems. Use the upper 28bits of the id of an index entry as bloom filter. This allows skipping the index entry traversal most of the time if an id is not stored in the hashmap. The bloom filter embedded in the index entry id is check each time before following a reference to an index entry. This further reduces the risk of false positives. The bloom filter itself is basically for free on modern CPUs. The main performance cost of checking for unknown blobs in the index are the essentially random RAM accesses for the initial bucket lookup as well as following the next pointer in the index entries. With the bloom filter most of the time only the initial bucket lookup is necessary. This speeds up checking for unknown blobs by a factor 5 (!), while having no effect on the lookup of known blobs: $ benchstat no-bloom with-bloom name old time/op new time/op delta IndexHasUnknown-16 49.0ms ± 2% 9.9ms ± 7% -79.70% (p=0.000 n=10+10) IndexHasKnown-16 48.0ms ± 3% 47.9ms ± 3% ~ (p=0.968 n=10+9) This bloom filter parameters m=28 k=1 were derived empirically, while also leaving sufficient room for very large repositories. Before this commit, the final merge index step took roughly 1 second per million index entries. With the chosen bloom filter parameters, it would currently take 19 hours to just merge such an index. It is safe to assume that such large repositories don't exist. Comparison with other parameter sets: $ m=28 k=1 versus m=32 k=1 name old time/op new time/op delta IndexHasUnknown-16 49.0ms ± 2% 9.7ms ±16% -80.17% (p=0.000 n=10+10) IndexHasKnown-16 48.0ms ± 3% 48.4ms ± 3% ~ (p=0.436 n=10+10) $ m=28 k=1 versus m=24 k=1 name old time/op new time/op delta IndexHasUnknown-16 49.0ms ± 2% 10.8ms ±13% -77.90% (p=0.000 n=10+10) IndexHasKnown-16 48.0ms ± 3% 47.9ms ± 3% ~ (p=0.684 n=10+10) $ m=28 k=1 versus m=28 k=2 name old time/op new time/op delta IndexHasUnknown-16 49.0ms ± 2% 24.9ms ± 5% -49.27% (p=0.000 n=10+10) IndexHasKnown-16 48.0ms ± 3% 48.0ms ± 4% ~ (p=1.000 n=10+10) `k=2` outright wrecks the performance. This is most likely the case as it performs worse on longer index entry chains, which also happen to be the expensive ones to process. `m=32` yields diminishing returns, while getting within an order of magnitude of the largest known restic repositories. Design alternatives: In principle it would be possible to add a single large bloom filter instead of embedding them in the index entry ids. However, this bloom filter would necessarily incur additional random memory accesses and thus slow things down overall.
3c6c426 to
adce279
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR change? What problem does it solve?
After loading the individual indexes, restic merges them all into a single large index to improve performance. This latter step however can require a notable amount of time for large repositories.
This PR optimizes this process in two major ways
For the
MasterIndexMergebenachmark this yields the following overall improvement:The actual speedup of the index merging itself is larger as the test case also counts the creation of the test indexes.
The individual improvements yield the following:
Further performance optimizations would require a way to prefetch data in the indexmap. However, Go does not expose the corresponding hardware instructions.
Was the change previously discussed in an issue or on the forum?
Performance issues were reported in https://forum.restic.net/t/restic-mount-takes-5-minutes/10645/4
Checklist
[ ] I have added documentation for relevant changes (in the manual).changelog/unreleased/that describes the changes for our users (see template).