Skip to content

index: speed up masterindex merging#5713

Merged
MichaelEischer merged 6 commits into
restic:masterfrom
MichaelEischer:index-optimization
May 9, 2026
Merged

index: speed up masterindex merging#5713
MichaelEischer merged 6 commits into
restic:masterfrom
MichaelEischer:index-optimization

Conversation

@MichaelEischer

@MichaelEischer MichaelEischer commented Feb 15, 2026

Copy link
Copy Markdown
Member

What does this PR change? What problem does it solve?

After loading the individual indexes, restic merges them all into a single large index to improve performance. This latter step however can require a notable amount of time for large repositories.

This PR optimizes this process in two major ways

  • To prevent duplicate entries in the index, it is necessary to check for duplicates. In most cases none are found though. Introduce a bloom filter into the index to optimize for this case. Note that the bloom filter only exists on 64-bit systems. See the commit message for details.
  • Preallocate the final index to avoid multiple expensive resizes.

For the MasterIndexMerge benachmark this yields the following overall improvement:

benchstat before all-optimizations
name                 old time/op    new time/op    delta
MasterIndexMerge-16     7.25s ± 2%     2.63s ± 2%  -63.69%  (p=0.000 n=10+10)

name                 old alloc/op   new alloc/op   delta
MasterIndexMerge-16    3.73GB ± 0%    2.96GB ± 0%  -20.50%  (p=0.000 n=7+10)

name                 old allocs/op  new allocs/op  delta
MasterIndexMerge-16     5.73M ± 0%     5.60M ± 0%   -2.24%  (p=0.000 n=7+10)

The actual speedup of the index merging itself is larger as the test case also counts the creation of the test indexes.

The individual improvements yield the following:

benchstat before with-bloom-filter
name                 old time/op    new time/op    delta
MasterIndexMerge-16     7.25s ± 2%     4.51s ± 2%  -37.89%  (p=0.000 n=10+9)

name                 old alloc/op   new alloc/op   delta
MasterIndexMerge-16    3.73GB ± 0%    3.73GB ± 0%     ~     (p=0.295 n=7+9)

name                 old allocs/op  new allocs/op  delta
MasterIndexMerge-16     5.73M ± 0%     5.73M ± 0%     ~     (p=0.295 n=7+9)
benchstat with-bloom-filter with-index-bucket-preallocation
name                 old time/op    new time/op    delta
MasterIndexMerge-16     4.51s ± 2%     2.71s ± 1%  -39.96%  (p=0.000 n=9+8)

name                 old alloc/op   new alloc/op   delta
MasterIndexMerge-16    3.73GB ± 0%    3.66GB ± 0%   -1.75%  (p=0.000 n=9+10)

name                 old allocs/op  new allocs/op  delta
MasterIndexMerge-16     5.73M ± 0%     5.72M ± 0%   -0.09%  (p=0.000 n=9+10)
benchstat with-index-bucket-preallocation with-index-bucket-and-backing-array-preallocation
name                 old time/op    new time/op    delta
MasterIndexMerge-16     2.71s ± 1%     2.63s ± 2%   -2.63%  (p=0.000 n=8+10)

name                 old alloc/op   new alloc/op   delta
MasterIndexMerge-16    3.66GB ± 0%    2.96GB ± 0%  -19.09%  (p=0.000 n=10+10)

name                 old allocs/op  new allocs/op  delta
MasterIndexMerge-16     5.72M ± 0%     5.60M ± 0%   -2.15%  (p=0.000 n=10+10)

Further performance optimizations would require a way to prefetch data in the indexmap. However, Go does not expose the corresponding hardware instructions.

Was the change previously discussed in an issue or on the forum?

Performance issues were reported in https://forum.restic.net/t/restic-mount-takes-5-minutes/10645/4

Checklist

  • I have added tests for all code changes.
  • [ ] I have added documentation for relevant changes (in the manual).
  • There's a new file in changelog/unreleased/ that describes the changes for our users (see template).
  • I'm done! This pull request is ready for review.

`b.Loop()` drastically shortens benchmark execution times for tests with
an expensive initialization phase as it only has to happen once now.
…blobs

Only in use on 64-bit systems. Use the upper 28bits of the id of an
index entry as bloom filter. This allows skipping the index entry
traversal most of the time if an id is not stored in the hashmap.

The bloom filter embedded in the index entry id is check each time
before following a reference to an index entry. This further reduces
the risk of false positives. The bloom filter itself is basically for
free on modern CPUs.

The main performance cost of checking for unknown blobs in the index are
the essentially random RAM accesses for the initial bucket lookup as
well as following the next pointer in the index entries. With the bloom
filter most of the time only the initial bucket lookup is necessary.

This speeds up checking for unknown blobs by a factor 5 (!), while
having no effect on the lookup of known blobs:

$ benchstat no-bloom with-bloom
name                old time/op  new time/op  delta
IndexHasUnknown-16  49.0ms ± 2%   9.9ms ± 7%  -79.70%  (p=0.000 n=10+10)
IndexHasKnown-16    48.0ms ± 3%  47.9ms ± 3%     ~     (p=0.968 n=10+9)

This bloom filter parameters m=28 k=1 were derived empirically, while
also leaving sufficient room for very large repositories. Before this
commit, the final merge index step took roughly 1 second per million
index entries. With the chosen bloom filter parameters, it would
currently take 19 hours to just merge such an index. It is safe to
assume that such large repositories don't exist.

Comparison with other parameter sets:

$ m=28 k=1 versus m=32 k=1
name                old time/op  new time/op  delta
IndexHasUnknown-16  49.0ms ± 2%   9.7ms ±16%  -80.17%  (p=0.000 n=10+10)
IndexHasKnown-16    48.0ms ± 3%  48.4ms ± 3%     ~     (p=0.436 n=10+10)

$ m=28 k=1 versus m=24 k=1
name                old time/op  new time/op  delta
IndexHasUnknown-16  49.0ms ± 2%  10.8ms ±13%  -77.90%  (p=0.000 n=10+10)
IndexHasKnown-16    48.0ms ± 3%  47.9ms ± 3%     ~     (p=0.684 n=10+10)

$ m=28 k=1 versus m=28 k=2
name                old time/op  new time/op  delta
IndexHasUnknown-16  49.0ms ± 2%  24.9ms ± 5%  -49.27%  (p=0.000 n=10+10)
IndexHasKnown-16    48.0ms ± 3%  48.0ms ± 4%     ~     (p=1.000 n=10+10)

`k=2` outright wrecks the performance. This is most likely the case as
it performs worse on longer index entry chains, which also happen to be
the expensive ones to process.

`m=32` yields diminishing returns, while getting within an order of
magnitude of the largest known restic repositories.

Design alternatives:

In principle it would be possible to add a single large bloom filter
instead of embedding them in the index entry ids. However, this bloom
filter would necessarily incur additional random memory accesses and
thus slow things down overall.

@MichaelEischer MichaelEischer left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MichaelEischer MichaelEischer merged commit 5b7df81 into restic:master May 9, 2026
11 checks passed
@MichaelEischer MichaelEischer deleted the index-optimization branch May 9, 2026 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant