index: speed up masterindex merging by MichaelEischer · Pull Request #5713 · restic/restic

MichaelEischer · 2026-02-15T13:56:42Z

What does this PR change? What problem does it solve?

After loading the individual indexes, restic merges them all into a single large index to improve performance. This latter step however can require a notable amount of time for large repositories.

This PR optimizes this process in two major ways

To prevent duplicate entries in the index, it is necessary to check for duplicates. In most cases none are found though. Introduce a bloom filter into the index to optimize for this case. Note that the bloom filter only exists on 64-bit systems. See the commit message for details.
Preallocate the final index to avoid multiple expensive resizes.

For the MasterIndexMerge benachmark this yields the following overall improvement:

benchstat before all-optimizations
name                 old time/op    new time/op    delta
MasterIndexMerge-16     7.25s ± 2%     2.63s ± 2%  -63.69%  (p=0.000 n=10+10)

name                 old alloc/op   new alloc/op   delta
MasterIndexMerge-16    3.73GB ± 0%    2.96GB ± 0%  -20.50%  (p=0.000 n=7+10)

name                 old allocs/op  new allocs/op  delta
MasterIndexMerge-16     5.73M ± 0%     5.60M ± 0%   -2.24%  (p=0.000 n=7+10)

The actual speedup of the index merging itself is larger as the test case also counts the creation of the test indexes.

The individual improvements yield the following:

benchstat before with-bloom-filter
name                 old time/op    new time/op    delta
MasterIndexMerge-16     7.25s ± 2%     4.51s ± 2%  -37.89%  (p=0.000 n=10+9)

name                 old alloc/op   new alloc/op   delta
MasterIndexMerge-16    3.73GB ± 0%    3.73GB ± 0%     ~     (p=0.295 n=7+9)

name                 old allocs/op  new allocs/op  delta
MasterIndexMerge-16     5.73M ± 0%     5.73M ± 0%     ~     (p=0.295 n=7+9)

benchstat with-bloom-filter with-index-bucket-preallocation
name                 old time/op    new time/op    delta
MasterIndexMerge-16     4.51s ± 2%     2.71s ± 1%  -39.96%  (p=0.000 n=9+8)

name                 old alloc/op   new alloc/op   delta
MasterIndexMerge-16    3.73GB ± 0%    3.66GB ± 0%   -1.75%  (p=0.000 n=9+10)

name                 old allocs/op  new allocs/op  delta
MasterIndexMerge-16     5.73M ± 0%     5.72M ± 0%   -0.09%  (p=0.000 n=9+10)

benchstat with-index-bucket-preallocation with-index-bucket-and-backing-array-preallocation
name                 old time/op    new time/op    delta
MasterIndexMerge-16     2.71s ± 1%     2.63s ± 2%   -2.63%  (p=0.000 n=8+10)

name                 old alloc/op   new alloc/op   delta
MasterIndexMerge-16    3.66GB ± 0%    2.96GB ± 0%  -19.09%  (p=0.000 n=10+10)

name                 old allocs/op  new allocs/op  delta
MasterIndexMerge-16     5.72M ± 0%     5.60M ± 0%   -2.15%  (p=0.000 n=10+10)

Further performance optimizations would require a way to prefetch data in the indexmap. However, Go does not expose the corresponding hardware instructions.

Was the change previously discussed in an issue or on the forum?

Performance issues were reported in https://forum.restic.net/t/restic-mount-takes-5-minutes/10645/4

Checklist

I have added tests for all code changes.
~~[ ] I have added documentation for relevant changes (in the manual).~~
There's a new file in changelog/unreleased/ that describes the changes for our users (see template).
I'm done! This pull request is ready for review.

`b.Loop()` drastically shortens benchmark execution times for tests with an expensive initialization phase as it only has to happen once now.

…blobs Only in use on 64-bit systems. Use the upper 28bits of the id of an index entry as bloom filter. This allows skipping the index entry traversal most of the time if an id is not stored in the hashmap. The bloom filter embedded in the index entry id is check each time before following a reference to an index entry. This further reduces the risk of false positives. The bloom filter itself is basically for free on modern CPUs. The main performance cost of checking for unknown blobs in the index are the essentially random RAM accesses for the initial bucket lookup as well as following the next pointer in the index entries. With the bloom filter most of the time only the initial bucket lookup is necessary. This speeds up checking for unknown blobs by a factor 5 (!), while having no effect on the lookup of known blobs: $ benchstat no-bloom with-bloom name old time/op new time/op delta IndexHasUnknown-16 49.0ms ± 2% 9.9ms ± 7% -79.70% (p=0.000 n=10+10) IndexHasKnown-16 48.0ms ± 3% 47.9ms ± 3% ~ (p=0.968 n=10+9) This bloom filter parameters m=28 k=1 were derived empirically, while also leaving sufficient room for very large repositories. Before this commit, the final merge index step took roughly 1 second per million index entries. With the chosen bloom filter parameters, it would currently take 19 hours to just merge such an index. It is safe to assume that such large repositories don't exist. Comparison with other parameter sets: $ m=28 k=1 versus m=32 k=1 name old time/op new time/op delta IndexHasUnknown-16 49.0ms ± 2% 9.7ms ±16% -80.17% (p=0.000 n=10+10) IndexHasKnown-16 48.0ms ± 3% 48.4ms ± 3% ~ (p=0.436 n=10+10) $ m=28 k=1 versus m=24 k=1 name old time/op new time/op delta IndexHasUnknown-16 49.0ms ± 2% 10.8ms ±13% -77.90% (p=0.000 n=10+10) IndexHasKnown-16 48.0ms ± 3% 47.9ms ± 3% ~ (p=0.684 n=10+10) $ m=28 k=1 versus m=28 k=2 name old time/op new time/op delta IndexHasUnknown-16 49.0ms ± 2% 24.9ms ± 5% -49.27% (p=0.000 n=10+10) IndexHasKnown-16 48.0ms ± 3% 48.0ms ± 4% ~ (p=1.000 n=10+10) `k=2` outright wrecks the performance. This is most likely the case as it performs worse on longer index entry chains, which also happen to be the expensive ones to process. `m=32` yields diminishing returns, while getting within an order of magnitude of the largest known restic repositories. Design alternatives: In principle it would be possible to add a single large bloom filter instead of embedding them in the index entry ids. However, this bloom filter would necessarily incur additional random memory accesses and thus slow things down overall.

MichaelEischer

LGTM

MichaelEischer added 6 commits May 10, 2026 00:35

index: make tests more representative

e33ed5d

index: modernize masterindex tests

320f709

`b.Loop()` drastically shortens benchmark execution times for tests with an expensive initialization phase as it only has to happen once now.

index: support index preallocation

934c615

index: also preallocate hashed array tree

5c935e7

add changelog

adce279

MichaelEischer force-pushed the index-optimization branch from 3c6c426 to adce279 Compare May 9, 2026 22:38

MichaelEischer commented May 9, 2026

View reviewed changes

MichaelEischer merged commit 5b7df81 into restic:master May 9, 2026
11 checks passed

MichaelEischer deleted the index-optimization branch May 9, 2026 23:09

amrkmn mentioned this pull request Jun 9, 2026

restic 0.19.0 amrkmn/homebrew-tap#803

Closed

BrewTestBot mentioned this pull request Jun 9, 2026

restic 0.19.0 Homebrew/homebrew-core#287125

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index: speed up masterindex merging#5713

index: speed up masterindex merging#5713
MichaelEischer merged 6 commits into
restic:masterfrom
MichaelEischer:index-optimization

MichaelEischer commented Feb 15, 2026 •

edited

Loading

Uh oh!

MichaelEischer left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MichaelEischer commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR change? What problem does it solve?

Was the change previously discussed in an issue or on the forum?

Checklist

Uh oh!

MichaelEischer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MichaelEischer commented Feb 15, 2026 •

edited

Loading