index: optimize go GC performance by MichaelEischer · Pull Request #4352 · restic/restic

MichaelEischer · 2023-06-02T16:10:31Z

What does this PR change? What problem does it solve?

The current index implementation using a custom indexmap is not particular garbage-collection friendly. As each indexEntry and the bucket list contain pointers, this requires the garbage collector to chase pointers to random locations throughout the index.

This PR moves the indexEntries into an array and only adds the array index instead of a pointer. Thereby, the indexEntries and bucket list no longer contain pointers and are not scanned by the garbage collector. The array containing all indexEntries is implemented as https://en.wikipedia.org/wiki/Hashed_array_tree to minimize the memory overhead.

The top-layer of the hashed array tree (HAT) also has a size of O(sqrt(n)), which should make it cache efficient. The top-layer should be small enough to easily fit into the CPU cache and thus only adds little overhead compared to directly accessing an index entry via a pointer.

As a bonus, the indexmap.Each/Grow functions are now able to iterate over the indexEntries array instead of following random pointers. This significantly speeds up both functions.

Results

The indirection through the HAT increases the index lookup duration by 20-40% (the average is probably between 20-30% but the results are pretty noisy as a result of the varying map entry layout due to the maphash randomization), although the MasterIndex based tests show numbers between 0 to 35% overhead.

Iterating over the index is now up to 8 times faster, which also translates to slightly faster index encoding.
GC is now nearly for free.

grow() is also pretty interesting, the performance changes are likely a result of a different order of the chained indexEntries.

Commits:

fast-grow index: implement indexmap.grow() without random access
fast-each index: let indexmap.Each iterate in allocation order
hat-opt index: remove redundant storage of indexmap size
hat-inline index: Allow inlining of HAT
hat-base index: store indexEntries in hashed array tree
remove-pointers index: Remove pointers from within indexentrys
base - index: add garbage collection benchmark

benchstat base remove-pointers

name                                              old time/op    new time/op    delta
IndexMapHash-12                                     1.25µs ± 3%    1.27µs ± 1%    +1.62%  (p=0.000 n=15+15)
DecodeIndex-12                                       2.38s ± 4%     2.35s ± 1%      ~     (p=0.367 n=15+15)
DecodeIndexParallel-12                               2.31s ± 5%     2.38s ± 1%    +2.70%  (p=0.041 n=15+12)
EncodeIndex/100-12                                   618µs ± 3%     621µs ± 3%      ~     (p=0.233 n=15+15)
EncodeIndex/1000-12                                 6.05ms ± 3%    6.11ms ± 2%      ~     (p=0.074 n=15+15)
EncodeIndex/10000-12                                64.5ms ± 2%    63.9ms ± 2%      ~     (p=0.134 n=15+14)
IndexHasUnknown-12                                  25.2ns ± 9%    26.7ns ±16%      ~     (p=0.083 n=14+15)
IndexHasKnown-12                                    26.8ns ± 7%    27.7ns ±11%    +3.35%  (p=0.030 n=13+15)
IndexAlloc-12                                        301ms ± 1%     338ms ± 2%   +12.34%  (p=0.000 n=15+15)
IndexAllocParallel-12                                107ms ± 3%      92ms ± 2%   -13.80%  (p=0.000 n=15+14)
MasterIndexAlloc-12                                  133ms ± 2%     197ms ± 3%   +48.36%  (p=0.000 n=15+15)
MasterIndexLookupSingleIndex-12                     60.3ns ±10%    61.3ns ± 9%      ~     (p=0.305 n=15+15)
MasterIndexLookupMultipleIndex-12                   60.4ns ± 8%    54.9ns ± 6%    -9.22%  (p=0.000 n=15+14)
MasterIndexLookupSingleIndexUnknown-12              31.9ns ±11%    33.9ns ±12%    +6.27%  (p=0.015 n=15+15)
MasterIndexLookupMultipleIndexUnknown-12            30.4ns ± 9%    31.3ns ± 6%      ~     (p=0.083 n=15+15)
MasterIndexLookupParallel/known,indices=25-12        163ns ± 9%     165ns ±12%      ~     (p=0.723 n=14+15)
MasterIndexLookupParallel/unknown,indices=25-12      129ns ±13%     133ns ±19%      ~     (p=0.330 n=15+15)
MasterIndexLookupParallel/known,indices=50-12        144ns ±14%     166ns ±13%   +15.12%  (p=0.000 n=15+15)
MasterIndexLookupParallel/unknown,indices=50-12      128ns ±11%     133ns ±14%      ~     (p=0.075 n=14+15)
MasterIndexLookupParallel/known,indices=100-12       144ns ±15%     160ns ±18%   +10.89%  (p=0.004 n=15+15)
MasterIndexLookupParallel/unknown,indices=100-12     130ns ± 5%     130ns ±13%      ~     (p=0.892 n=13+15)
MasterIndexLookupBlobSize-12                        30.3ns ± 6%    31.2ns ± 7%      ~     (p=0.087 n=15+15)
MasterIndexEach-12                                   734ms ± 1%     669ms ± 3%    -8.97%  (p=0.000 n=15+15)
MasterIndexGC-12                                     116ms ±36%       0ms ± 2%   -99.74%  (p=0.000 n=15+12)

name                                              old speed      new speed      delta
IndexMapHash-12                                   3.29GB/s ± 3%  3.23GB/s ± 1%    -1.60%  (p=0.000 n=15+15)

name                                              old alloc/op   new alloc/op   delta
IndexMapHash-12                                      0.00B          0.00B           ~     (all equal)
EncodeIndex/100-12                                   263kB ± 0%     263kB ± 0%      ~     (p=0.926 n=15+15)
EncodeIndex/1000-12                                 2.61MB ± 0%    2.61MB ± 0%      ~     (p=0.516 n=15+12)
EncodeIndex/10000-12                                25.7MB ± 2%    25.7MB ± 5%      ~     (p=0.775 n=15+15)
IndexAlloc-12                                        451MB ± 0%     954MB ± 0%  +111.49%  (p=0.000 n=13+15)
IndexAllocParallel-12                                451MB ± 0%     954MB ± 0%  +111.46%  (p=0.000 n=11+14)
MasterIndexAlloc-12                                  162MB ± 0%     347MB ± 0%  +114.17%  (p=0.000 n=15+13)

name                                              old allocs/op  new allocs/op  delta
IndexMapHash-12                                       0.00           0.00           ~     (all equal)
EncodeIndex/100-12                                   2.39k ± 0%     2.39k ± 0%      ~     (all equal)
EncodeIndex/1000-12                                  22.9k ± 0%     22.9k ± 0%      ~     (p=0.061 n=15+13)
EncodeIndex/10000-12                                  229k ± 0%      229k ± 0%      ~     (p=0.690 n=15+15)
IndexAlloc-12                                        1.53M ± 0%     1.10M ± 0%   -28.25%  (p=0.000 n=14+15)
IndexAllocParallel-12                                1.53M ± 0%     1.10M ± 0%   -28.25%  (p=0.000 n=11+15)
MasterIndexAlloc-12                                   585k ± 0%      435k ± 0%   -25.67%  (p=0.000 n=15+13)

benchstat base hat-base

name                                              old time/op    new time/op    delta
IndexMapHash-12                                     1.25µs ± 3%    1.22µs ± 2%   -1.86%  (p=0.000 n=15+14)
DecodeIndex-12                                       2.38s ± 4%     2.28s ± 1%   -4.14%  (p=0.000 n=15+14)
DecodeIndexParallel-12                               2.31s ± 5%     2.35s ± 4%     ~     (p=0.056 n=15+15)
EncodeIndex/100-12                                   618µs ± 3%     623µs ± 3%     ~     (p=0.074 n=15+15)
EncodeIndex/1000-12                                 6.05ms ± 3%    6.06ms ± 1%     ~     (p=0.591 n=15+14)
EncodeIndex/10000-12                                64.5ms ± 2%    64.6ms ± 1%     ~     (p=0.186 n=15+14)
IndexHasUnknown-12                                  25.2ns ± 9%    35.0ns ±29%  +39.01%  (p=0.000 n=14+15)
IndexHasKnown-12                                    26.8ns ± 7%    32.5ns ±19%  +21.18%  (p=0.000 n=13+15)
IndexAlloc-12                                        301ms ± 1%     288ms ± 2%   -4.25%  (p=0.000 n=15+15)
IndexAllocParallel-12                                107ms ± 3%      70ms ± 1%  -33.90%  (p=0.000 n=15+14)
MasterIndexAlloc-12                                  133ms ± 2%     147ms ± 3%  +10.52%  (p=0.000 n=15+15)
MasterIndexLookupSingleIndex-12                     60.3ns ±10%    67.2ns ±17%  +11.33%  (p=0.000 n=15+15)
MasterIndexLookupMultipleIndex-12                   60.4ns ± 8%    60.4ns ±11%     ~     (p=0.645 n=15+15)
MasterIndexLookupSingleIndexUnknown-12              31.9ns ±11%    40.2ns ±21%  +25.83%  (p=0.000 n=15+13)
MasterIndexLookupMultipleIndexUnknown-12            30.4ns ± 9%    36.9ns ±25%  +21.36%  (p=0.000 n=15+14)
MasterIndexLookupParallel/known,indices=25-12        163ns ± 9%     178ns ±18%   +8.89%  (p=0.003 n=14+15)
MasterIndexLookupParallel/unknown,indices=25-12      129ns ±13%     136ns ±19%     ~     (p=0.083 n=15+15)
MasterIndexLookupParallel/known,indices=50-12        144ns ±14%     178ns ±15%  +23.18%  (p=0.000 n=15+15)
MasterIndexLookupParallel/unknown,indices=50-12      128ns ±11%     140ns ± 7%   +9.33%  (p=0.000 n=14+13)
MasterIndexLookupParallel/known,indices=100-12       144ns ±15%     174ns ±16%  +20.81%  (p=0.000 n=15+15)
MasterIndexLookupParallel/unknown,indices=100-12     130ns ± 5%     140ns ±15%   +7.60%  (p=0.011 n=13+15)
MasterIndexLookupBlobSize-12                        30.3ns ± 6%    35.3ns ±16%  +16.50%  (p=0.000 n=15+14)
MasterIndexEach-12                                   734ms ± 1%     706ms ± 4%   -3.85%  (p=0.000 n=15+15)
MasterIndexGC-12                                     116ms ±36%       0ms ±25%  -99.73%  (p=0.000 n=15+13)

name                                              old speed      new speed      delta
IndexMapHash-12                                   3.29GB/s ± 3%  3.35GB/s ± 2%   +1.89%  (p=0.000 n=15+14)

name                                              old alloc/op   new alloc/op   delta
IndexMapHash-12                                      0.00B          0.00B          ~     (all equal)
EncodeIndex/100-12                                   263kB ± 0%     263kB ± 0%   +0.02%  (p=0.041 n=15+15)
EncodeIndex/1000-12                                 2.61MB ± 0%    2.61MB ± 1%     ~     (p=0.690 n=15+15)
EncodeIndex/10000-12                                25.7MB ± 2%    25.4MB ± 0%     ~     (p=0.129 n=15+12)
IndexAlloc-12                                        451MB ± 0%     496MB ± 0%   +9.95%  (p=0.000 n=13+15)
IndexAllocParallel-12                                451MB ± 0%     496MB ± 0%   +9.95%  (p=0.000 n=11+15)
MasterIndexAlloc-12                                  162MB ± 0%     213MB ± 0%  +31.58%  (p=0.000 n=15+13)

name                                              old allocs/op  new allocs/op  delta
IndexMapHash-12                                       0.00           0.00          ~     (all equal)
EncodeIndex/100-12                                   2.39k ± 0%     2.39k ± 0%     ~     (all equal)
EncodeIndex/1000-12                                  22.9k ± 0%     22.9k ± 0%     ~     (p=0.074 n=15+12)
EncodeIndex/10000-12                                  229k ± 0%      229k ± 0%     ~     (p=0.927 n=15+15)
IndexAlloc-12                                        1.53M ± 0%     1.10M ± 0%  -28.13%  (p=0.000 n=14+15)
IndexAllocParallel-12                                1.53M ± 0%     1.10M ± 0%  -28.13%  (p=0.000 n=11+15)
MasterIndexAlloc-12                                   585k ± 0%      507k ± 0%  -13.33%  (p=0.000 n=15+14)

benchstat base hat-inline

name                                              old time/op    new time/op    delta
IndexMapHash-12                                     1.25µs ± 3%    1.25µs ± 2%     ~     (p=0.751 n=15+15)
DecodeIndex-12                                       2.38s ± 4%     2.28s ± 1%   -4.17%  (p=0.000 n=15+14)
DecodeIndexParallel-12                               2.31s ± 5%     2.33s ± 5%     ~     (p=0.267 n=15+15)
EncodeIndex/100-12                                   618µs ± 3%     624µs ± 5%     ~     (p=0.137 n=15+15)
EncodeIndex/1000-12                                 6.05ms ± 3%    6.04ms ± 1%     ~     (p=0.983 n=15+14)
EncodeIndex/10000-12                                64.5ms ± 2%    65.1ms ± 3%   +0.90%  (p=0.033 n=15+15)
IndexHasUnknown-12                                  25.2ns ± 9%    33.4ns ±39%  +32.65%  (p=0.000 n=14+15)
IndexHasKnown-12                                    26.8ns ± 7%    34.4ns ±36%  +28.24%  (p=0.000 n=13+15)
IndexAlloc-12                                        301ms ± 1%     286ms ± 2%   -4.81%  (p=0.000 n=15+15)
IndexAllocParallel-12                                107ms ± 3%      70ms ± 2%  -34.48%  (p=0.000 n=15+15)
MasterIndexAlloc-12                                  133ms ± 2%     146ms ± 3%  +10.06%  (p=0.000 n=15+15)
MasterIndexLookupSingleIndex-12                     60.3ns ±10%    67.5ns ±16%  +11.92%  (p=0.001 n=15+14)
MasterIndexLookupMultipleIndex-12                   60.4ns ± 8%    60.9ns ±14%     ~     (p=0.775 n=15+15)
MasterIndexLookupSingleIndexUnknown-12              31.9ns ±11%    42.6ns ±37%  +33.55%  (p=0.000 n=15+15)
MasterIndexLookupMultipleIndexUnknown-12            30.4ns ± 9%    34.8ns ±19%  +14.52%  (p=0.000 n=15+15)
MasterIndexLookupParallel/known,indices=25-12        163ns ± 9%     173ns ±15%   +6.01%  (p=0.010 n=14+15)
MasterIndexLookupParallel/unknown,indices=25-12      129ns ±13%     140ns ±18%   +8.97%  (p=0.003 n=15+15)
MasterIndexLookupParallel/known,indices=50-12        144ns ±14%     171ns ±16%  +19.01%  (p=0.000 n=15+15)
MasterIndexLookupParallel/unknown,indices=50-12      128ns ±11%     148ns ±13%  +15.12%  (p=0.000 n=14+14)
MasterIndexLookupParallel/known,indices=100-12       144ns ±15%     170ns ±16%  +17.83%  (p=0.000 n=15+15)
MasterIndexLookupParallel/unknown,indices=100-12     130ns ± 5%     142ns ±21%   +9.01%  (p=0.004 n=13+15)
MasterIndexLookupBlobSize-12                        30.3ns ± 6%    36.0ns ±18%  +18.87%  (p=0.000 n=15+15)
MasterIndexEach-12                                   734ms ± 1%     710ms ± 5%   -3.33%  (p=0.000 n=15+15)
MasterIndexGC-12                                     116ms ±36%       0ms ±28%  -99.64%  (p=0.000 n=15+15)

name                                              old speed      new speed      delta
IndexMapHash-12                                   3.29GB/s ± 3%  3.28GB/s ± 2%     ~     (p=0.784 n=15+15)

name                                              old alloc/op   new alloc/op   delta
IndexMapHash-12                                      0.00B          0.00B          ~     (all equal)
EncodeIndex/100-12                                   263kB ± 0%     263kB ± 0%     ~     (p=0.862 n=15+15)
EncodeIndex/1000-12                                 2.61MB ± 0%    2.61MB ± 1%     ~     (p=0.624 n=15+15)
EncodeIndex/10000-12                                25.7MB ± 2%    25.8MB ± 5%     ~     (p=0.683 n=15+15)
IndexAlloc-12                                        451MB ± 0%     496MB ± 0%   +9.95%  (p=0.000 n=13+15)
IndexAllocParallel-12                                451MB ± 0%     496MB ± 0%   +9.95%  (p=0.000 n=11+15)
MasterIndexAlloc-12                                  162MB ± 0%     213MB ± 0%  +31.58%  (p=0.000 n=15+11)

name                                              old allocs/op  new allocs/op  delta
IndexMapHash-12                                       0.00           0.00          ~     (all equal)
EncodeIndex/100-12                                   2.39k ± 0%     2.39k ± 0%     ~     (all equal)
EncodeIndex/1000-12                                  22.9k ± 0%     22.9k ± 0%     ~     (p=0.061 n=15+13)
EncodeIndex/10000-12                                  229k ± 0%      229k ± 0%     ~     (p=0.719 n=15+15)
IndexAlloc-12                                        1.53M ± 0%     1.10M ± 0%  -28.13%  (p=0.000 n=14+15)
IndexAllocParallel-12                                1.53M ± 0%     1.10M ± 0%  -28.13%  (p=0.000 n=11+15)
MasterIndexAlloc-12                                   585k ± 0%      507k ± 0%  -13.33%  (p=0.000 n=15+11)

benchstat base hat-opt

name                                              old time/op    new time/op    delta
IndexMapHash-12                                     1.25µs ± 3%    1.24µs ± 1%     ~     (p=0.082 n=15+14)
DecodeIndex-12                                       2.38s ± 4%     2.28s ± 1%   -4.04%  (p=0.000 n=15+15)
DecodeIndexParallel-12                               2.31s ± 5%     2.34s ± 5%     ~     (p=0.148 n=15+15)
EncodeIndex/100-12                                   618µs ± 3%     627µs ± 2%   +1.55%  (p=0.002 n=15+15)
EncodeIndex/1000-12                                 6.05ms ± 3%    6.05ms ± 2%     ~     (p=0.870 n=15+15)
EncodeIndex/10000-12                                64.5ms ± 2%    64.4ms ± 2%     ~     (p=0.496 n=15+13)
IndexHasUnknown-12                                  25.2ns ± 9%    30.6ns ±13%  +21.68%  (p=0.000 n=14+14)
IndexHasKnown-12                                    26.8ns ± 7%    33.4ns ±23%  +24.61%  (p=0.000 n=13+15)
IndexAlloc-12                                        301ms ± 1%     286ms ± 2%   -4.88%  (p=0.000 n=15+14)
IndexAllocParallel-12                                107ms ± 3%      70ms ± 3%  -34.63%  (p=0.000 n=15+15)
MasterIndexAlloc-12                                  133ms ± 2%     144ms ± 2%   +8.70%  (p=0.000 n=15+15)
MasterIndexLookupSingleIndex-12                     60.3ns ±10%    66.2ns ±16%   +9.69%  (p=0.004 n=15+15)
MasterIndexLookupMultipleIndex-12                   60.4ns ± 8%    60.6ns ±13%     ~     (p=0.992 n=15+15)
MasterIndexLookupSingleIndexUnknown-12              31.9ns ±11%    40.9ns ±21%  +28.16%  (p=0.000 n=15+15)
MasterIndexLookupMultipleIndexUnknown-12            30.4ns ± 9%    36.6ns ±27%  +20.32%  (p=0.000 n=15+15)
MasterIndexLookupParallel/known,indices=25-12        163ns ± 9%     173ns ±19%   +5.90%  (p=0.014 n=14+15)
MasterIndexLookupParallel/unknown,indices=25-12      129ns ±13%     143ns ±13%  +11.12%  (p=0.000 n=15+14)
MasterIndexLookupParallel/known,indices=50-12        144ns ±14%     176ns ±14%  +22.23%  (p=0.000 n=15+14)
MasterIndexLookupParallel/unknown,indices=50-12      128ns ±11%     132ns ±19%     ~     (p=0.598 n=14+15)
MasterIndexLookupParallel/known,indices=100-12       144ns ±15%     176ns ±14%  +21.67%  (p=0.000 n=15+15)
MasterIndexLookupParallel/unknown,indices=100-12     130ns ± 5%     138ns ±17%     ~     (p=0.060 n=13+15)
MasterIndexLookupBlobSize-12                        30.3ns ± 6%    37.5ns ±22%  +23.92%  (p=0.000 n=15+15)
MasterIndexEach-12                                   734ms ± 1%     696ms ± 6%   -5.26%  (p=0.000 n=15+15)
MasterIndexGC-12                                     116ms ±36%       0ms ±25%  -99.68%  (p=0.000 n=15+15)

name                                              old speed      new speed      delta
IndexMapHash-12                                   3.29GB/s ± 3%  3.31GB/s ± 1%     ~     (p=0.085 n=15+14)

name                                              old alloc/op   new alloc/op   delta
IndexMapHash-12                                      0.00B          0.00B          ~     (all equal)
EncodeIndex/100-12                                   263kB ± 0%     263kB ± 0%     ~     (p=0.158 n=15+15)
EncodeIndex/1000-12                                 2.61MB ± 0%    2.61MB ± 1%     ~     (p=0.461 n=15+15)
EncodeIndex/10000-12                                25.7MB ± 2%    25.4MB ± 0%   -1.16%  (p=0.029 n=15+13)
IndexAlloc-12                                        451MB ± 0%     496MB ± 0%   +9.95%  (p=0.000 n=13+15)
IndexAllocParallel-12                                451MB ± 0%     496MB ± 0%   +9.95%  (p=0.000 n=11+15)
MasterIndexAlloc-12                                  162MB ± 0%     213MB ± 0%  +31.58%  (p=0.000 n=15+11)

name                                              old allocs/op  new allocs/op  delta
IndexMapHash-12                                       0.00           0.00          ~     (all equal)
EncodeIndex/100-12                                   2.39k ± 0%     2.39k ± 0%     ~     (all equal)
EncodeIndex/1000-12                                  22.9k ± 0%     22.9k ± 0%     ~     (p=1.000 n=15+15)
EncodeIndex/10000-12                                  229k ± 0%      229k ± 0%     ~     (p=0.796 n=15+15)
IndexAlloc-12                                        1.53M ± 0%     1.10M ± 0%  -28.13%  (p=0.000 n=14+15)
IndexAllocParallel-12                                1.53M ± 0%     1.10M ± 0%  -28.13%  (p=0.000 n=11+15)
MasterIndexAlloc-12                                   585k ± 0%      507k ± 0%  -13.33%  (p=0.000 n=15+11)

benchstat hat-opt fast-each

name                                              old time/op    new time/op    delta
IndexMapHash-12                                     1.24µs ± 1%    1.24µs ± 1%     ~     (p=0.059 n=14+14)
DecodeIndex-12                                       2.28s ± 1%     2.29s ± 3%     ~     (p=0.233 n=15+15)
DecodeIndexParallel-12                               2.34s ± 5%     2.28s ± 1%   -2.44%  (p=0.007 n=15+12)
EncodeIndex/100-12                                   627µs ± 2%     609µs ± 3%   -2.98%  (p=0.000 n=15+15)
EncodeIndex/1000-12                                 6.05ms ± 2%    5.83ms ± 3%   -3.58%  (p=0.000 n=15+15)
EncodeIndex/10000-12                                64.4ms ± 2%    58.3ms ± 4%   -9.47%  (p=0.000 n=13+15)
IndexHasUnknown-12                                  30.6ns ±13%    32.2ns ±24%     ~     (p=0.974 n=14+15)
IndexHasKnown-12                                    33.4ns ±23%    32.2ns ±27%     ~     (p=0.158 n=15+15)
IndexAlloc-12                                        286ms ± 2%     286ms ± 3%     ~     (p=0.847 n=14+15)
IndexAllocParallel-12                               69.6ms ± 3%    69.5ms ± 2%     ~     (p=0.983 n=15+14)
MasterIndexAlloc-12                                  144ms ± 2%     117ms ± 2%  -18.67%  (p=0.000 n=15+15)
MasterIndexLookupSingleIndex-12                     66.2ns ±16%    66.6ns ±15%     ~     (p=0.935 n=15+15)
MasterIndexLookupMultipleIndex-12                   60.6ns ±13%    60.7ns ±13%     ~     (p=0.926 n=15+15)
MasterIndexLookupSingleIndexUnknown-12              40.9ns ±21%    40.1ns ±20%     ~     (p=0.631 n=15+15)
MasterIndexLookupMultipleIndexUnknown-12            36.6ns ±27%    35.9ns ±29%     ~     (p=0.705 n=15+15)
MasterIndexLookupParallel/known,indices=25-12        173ns ±19%     171ns ±18%     ~     (p=0.829 n=15+15)
MasterIndexLookupParallel/unknown,indices=25-12      143ns ±13%     139ns ±25%     ~     (p=0.644 n=14+15)
MasterIndexLookupParallel/known,indices=50-12        176ns ±14%     167ns ±17%   -5.08%  (p=0.045 n=14+15)
MasterIndexLookupParallel/unknown,indices=50-12      132ns ±19%     141ns ±15%     ~     (p=0.157 n=15+15)
MasterIndexLookupParallel/known,indices=100-12       176ns ±14%     174ns ±14%     ~     (p=0.911 n=15+15)
MasterIndexLookupParallel/unknown,indices=100-12     138ns ±17%     136ns ±17%     ~     (p=0.661 n=15+15)
MasterIndexLookupBlobSize-12                        37.5ns ±22%    35.3ns ±20%     ~     (p=0.055 n=15+14)
MasterIndexEach-12                                   696ms ± 6%      90ms ± 1%  -87.05%  (p=0.000 n=15+13)
MasterIndexGC-12                                     373µs ±25%     387µs ±24%     ~     (p=0.683 n=15+15)

name                                              old speed      new speed      delta
IndexMapHash-12                                   3.31GB/s ± 1%  3.29GB/s ± 1%     ~     (p=0.062 n=14+14)

name                                              old alloc/op   new alloc/op   delta
IndexMapHash-12                                      0.00B          0.00B          ~     (all equal)
EncodeIndex/100-12                                   263kB ± 0%     263kB ± 0%   -0.02%  (p=0.036 n=15+15)
EncodeIndex/1000-12                                 2.61MB ± 1%    2.61MB ± 0%   -0.10%  (p=0.002 n=15+9)
EncodeIndex/10000-12                                25.4MB ± 0%    25.7MB ± 2%     ~     (p=0.058 n=13+15)
IndexAlloc-12                                        496MB ± 0%     496MB ± 0%     ~     (p=0.157 n=15+10)
IndexAllocParallel-12                                496MB ± 0%     496MB ± 0%     ~     (p=0.847 n=15+14)
MasterIndexAlloc-12                                  213MB ± 0%     213MB ± 0%     ~     (p=0.763 n=11+13)

name                                              old allocs/op  new allocs/op  delta
IndexMapHash-12                                       0.00           0.00          ~     (all equal)
EncodeIndex/100-12                                   2.39k ± 0%     2.39k ± 0%     ~     (all equal)
EncodeIndex/1000-12                                  22.9k ± 0%     22.9k ± 0%     ~     (p=0.700 n=15+15)
EncodeIndex/10000-12                                  229k ± 0%      229k ± 0%     ~     (p=0.192 n=15+15)
IndexAlloc-12                                        1.10M ± 0%     1.10M ± 0%     ~     (p=0.156 n=15+12)
IndexAllocParallel-12                                1.10M ± 0%     1.10M ± 0%     ~     (p=0.778 n=15+14)
MasterIndexAlloc-12                                   507k ± 0%      507k ± 0%   +0.00%  (p=0.003 n=11+13)

benchstat fast-each fast-grow

name                                              old time/op    new time/op    delta
IndexMapHash-12                                     1.24µs ± 1%    1.24µs ± 3%     ~     (p=0.058 n=14+15)
DecodeIndex-12                                       2.29s ± 3%     2.21s ± 1%   -3.65%  (p=0.000 n=15+13)
DecodeIndexParallel-12                               2.28s ± 1%     2.21s ± 2%   -3.18%  (p=0.000 n=12+12)
EncodeIndex/100-12                                   609µs ± 3%     606µs ± 4%     ~     (p=0.533 n=15+14)
EncodeIndex/1000-12                                 5.83ms ± 3%    5.84ms ± 2%     ~     (p=0.653 n=15+15)
EncodeIndex/10000-12                                58.3ms ± 4%    58.7ms ± 3%     ~     (p=0.305 n=15+15)
IndexHasUnknown-12                                  32.2ns ±24%    33.6ns ±19%     ~     (p=0.190 n=15+14)
IndexHasKnown-12                                    32.2ns ±27%    36.5ns ±27%  +13.54%  (p=0.034 n=15+15)
IndexAlloc-12                                        286ms ± 3%     205ms ± 2%  -28.26%  (p=0.000 n=15+15)
IndexAllocParallel-12                               69.5ms ± 2%    60.4ms ± 2%  -13.08%  (p=0.000 n=14+15)
MasterIndexAlloc-12                                  117ms ± 2%     115ms ± 3%   -2.25%  (p=0.000 n=15+15)
MasterIndexLookupSingleIndex-12                     66.6ns ±15%    68.1ns ±17%     ~     (p=0.436 n=15+15)
MasterIndexLookupMultipleIndex-12                   60.7ns ±13%    60.9ns ± 9%     ~     (p=0.902 n=15+15)
MasterIndexLookupSingleIndexUnknown-12              40.1ns ±20%    42.2ns ±25%     ~     (p=0.418 n=15+15)
MasterIndexLookupMultipleIndexUnknown-12            35.9ns ±29%    35.2ns ±18%     ~     (p=0.926 n=15+15)
MasterIndexLookupParallel/known,indices=25-12        171ns ±18%     172ns ±16%     ~     (p=0.959 n=15+15)
MasterIndexLookupParallel/unknown,indices=25-12      139ns ±25%     134ns ±20%     ~     (p=0.454 n=15+15)
MasterIndexLookupParallel/known,indices=50-12        167ns ±17%     166ns ±20%     ~     (p=0.846 n=15+15)
MasterIndexLookupParallel/unknown,indices=50-12      141ns ±15%     135ns ±11%     ~     (p=0.191 n=15+15)
MasterIndexLookupParallel/known,indices=100-12       174ns ±14%     169ns ±16%     ~     (p=0.486 n=15+15)
MasterIndexLookupParallel/unknown,indices=100-12     136ns ±17%     135ns ±18%     ~     (p=0.911 n=15+15)
MasterIndexLookupBlobSize-12                        35.3ns ±20%    30.8ns ± 1%  -12.54%  (p=0.000 n=14+13)
MasterIndexEach-12                                  90.1ms ± 1%    90.5ms ± 1%     ~     (p=0.118 n=13+15)
MasterIndexGC-12                                     387µs ±24%     378µs ±25%     ~     (p=0.775 n=15+15)

name                                              old speed      new speed      delta
IndexMapHash-12                                   3.29GB/s ± 1%  3.30GB/s ± 3%     ~     (p=0.063 n=14+15)

name                                              old alloc/op   new alloc/op   delta
IndexMapHash-12                                      0.00B          0.00B          ~     (all equal)
EncodeIndex/100-12                                   263kB ± 0%     263kB ± 0%   +0.04%  (p=0.003 n=15+15)
EncodeIndex/1000-12                                 2.61MB ± 0%    2.61MB ± 0%     ~     (p=0.055 n=9+15)
EncodeIndex/10000-12                                25.7MB ± 2%    25.8MB ± 5%     ~     (p=0.653 n=15+15)
IndexAlloc-12                                        496MB ± 0%     496MB ± 0%   -0.01%  (p=0.000 n=10+9)
IndexAllocParallel-12                                496MB ± 0%     496MB ± 0%   -0.00%  (p=0.000 n=14+15)
MasterIndexAlloc-12                                  213MB ± 0%     213MB ± 0%     ~     (p=0.723 n=13+15)

name                                              old allocs/op  new allocs/op  delta
IndexMapHash-12                                       0.00           0.00          ~     (all equal)
EncodeIndex/100-12                                   2.39k ± 0%     2.39k ± 0%     ~     (all equal)
EncodeIndex/1000-12                                  22.9k ± 0%     22.9k ± 0%     ~     (p=1.000 n=15+15)
EncodeIndex/10000-12                                  229k ± 0%      229k ± 0%     ~     (p=0.333 n=15+15)
IndexAlloc-12                                        1.10M ± 0%     1.10M ± 0%   -0.00%  (p=0.000 n=12+12)
IndexAllocParallel-12                                1.10M ± 0%     1.10M ± 0%   -0.00%  (p=0.000 n=14+15)
MasterIndexAlloc-12                                   507k ± 0%      507k ± 0%     ~     (p=0.566 n=13+15)

benchstat base fast-grow

name                                              old time/op    new time/op    delta
IndexMapHash-12                                     1.25µs ± 3%    1.24µs ± 3%     ~     (p=0.095 n=15+15)
DecodeIndex-12                                       2.38s ± 4%     2.21s ± 1%   -6.97%  (p=0.000 n=15+13)
DecodeIndexParallel-12                               2.31s ± 5%     2.21s ± 2%   -4.42%  (p=0.000 n=15+12)
EncodeIndex/100-12                                   618µs ± 3%     606µs ± 4%   -1.87%  (p=0.000 n=15+14)
EncodeIndex/1000-12                                 6.05ms ± 3%    5.84ms ± 2%   -3.42%  (p=0.000 n=15+15)
EncodeIndex/10000-12                                64.5ms ± 2%    58.7ms ± 3%   -9.01%  (p=0.000 n=15+15)
IndexHasUnknown-12                                  25.2ns ± 9%    33.6ns ±19%  +33.49%  (p=0.000 n=14+14)
IndexHasKnown-12                                    26.8ns ± 7%    36.5ns ±27%  +36.18%  (p=0.000 n=13+15)
IndexAlloc-12                                        301ms ± 1%     205ms ± 2%  -31.75%  (p=0.000 n=15+15)
IndexAllocParallel-12                                107ms ± 3%      60ms ± 2%  -43.31%  (p=0.000 n=15+15)
MasterIndexAlloc-12                                  133ms ± 2%     115ms ± 3%  -13.59%  (p=0.000 n=15+15)
MasterIndexLookupSingleIndex-12                     60.3ns ±10%    68.1ns ±17%  +12.93%  (p=0.001 n=15+15)
MasterIndexLookupMultipleIndex-12                   60.4ns ± 8%    60.9ns ± 9%     ~     (p=0.806 n=15+15)
MasterIndexLookupSingleIndexUnknown-12              31.9ns ±11%    42.2ns ±25%  +32.32%  (p=0.000 n=15+15)
MasterIndexLookupMultipleIndexUnknown-12            30.4ns ± 9%    35.2ns ±18%  +15.94%  (p=0.000 n=15+15)
MasterIndexLookupParallel/known,indices=25-12        163ns ± 9%     172ns ±16%     ~     (p=0.146 n=14+15)
MasterIndexLookupParallel/unknown,indices=25-12      129ns ±13%     134ns ±20%     ~     (p=0.161 n=15+15)
MasterIndexLookupParallel/known,indices=50-12        144ns ±14%     166ns ±20%  +15.09%  (p=0.000 n=15+15)
MasterIndexLookupParallel/unknown,indices=50-12      128ns ±11%     135ns ±11%     ~     (p=0.065 n=14+15)
MasterIndexLookupParallel/known,indices=100-12       144ns ±15%     169ns ±16%  +16.82%  (p=0.000 n=15+15)
MasterIndexLookupParallel/unknown,indices=100-12     130ns ± 5%     135ns ±18%     ~     (p=0.159 n=13+15)
MasterIndexLookupBlobSize-12                        30.3ns ± 6%    30.8ns ± 1%     ~     (p=0.333 n=15+13)
MasterIndexEach-12                                   734ms ± 1%      91ms ± 1%  -87.67%  (p=0.000 n=15+15)
MasterIndexGC-12                                     116ms ±36%       0ms ±25%  -99.67%  (p=0.000 n=15+15)

name                                              old speed      new speed      delta
IndexMapHash-12                                   3.29GB/s ± 3%  3.30GB/s ± 3%     ~     (p=0.116 n=15+15)

name                                              old alloc/op   new alloc/op   delta
IndexMapHash-12                                      0.00B          0.00B          ~     (all equal)
EncodeIndex/100-12                                   263kB ± 0%     263kB ± 0%   +0.04%  (p=0.009 n=15+15)
EncodeIndex/1000-12                                 2.61MB ± 0%    2.61MB ± 0%     ~     (p=0.148 n=15+15)
EncodeIndex/10000-12                                25.7MB ± 2%    25.8MB ± 5%     ~     (p=0.595 n=15+15)
IndexAlloc-12                                        451MB ± 0%     496MB ± 0%   +9.94%  (p=0.000 n=13+9)
IndexAllocParallel-12                                451MB ± 0%     496MB ± 0%   +9.95%  (p=0.000 n=11+15)
MasterIndexAlloc-12                                  162MB ± 0%     213MB ± 0%  +31.58%  (p=0.000 n=15+15)

name                                              old allocs/op  new allocs/op  delta
IndexMapHash-12                                       0.00           0.00          ~     (all equal)
EncodeIndex/100-12                                   2.39k ± 0%     2.39k ± 0%     ~     (all equal)
EncodeIndex/1000-12                                  22.9k ± 0%     22.9k ± 0%     ~     (p=1.000 n=15+15)
EncodeIndex/10000-12                                  229k ± 0%      229k ± 0%     ~     (p=0.642 n=15+15)
IndexAlloc-12                                        1.53M ± 0%     1.10M ± 0%  -28.13%  (p=0.000 n=14+12)
IndexAllocParallel-12                                1.53M ± 0%     1.10M ± 0%  -28.13%  (p=0.000 n=11+15)
MasterIndexAlloc-12                                   585k ± 0%      507k ± 0%  -13.33%  (p=0.000 n=15+15)

Summary

All in all, the drastically reduced GC overhead makes the slightly slower index lookups a worthwhile tradeoff. In particular, this should allow us to set GOGC=50 by default without regressing performance too much, as most of the data kept in memory by restic can simply be skipped by the garbage collector.

Was the change previously discussed in an issue or on the forum?

No.

Checklist

I have read the contribution guidelines.
I have enabled maintainer edits.
I have added tests for all code changes.
~~[ ] I have added documentation for relevant changes (in the manual).~~
There's a new file in changelog/unreleased/ that describes the changes for our users (see template).
I have run gofmt on the code in all commits.
All commit messages are formatted in the same style as the other commits in the repo.
I'm done! This pull request is ready for review.

Allocates an index and repeatedly triggers the GC.

The indexEntry objects are now allocated in a separate array. References to an indexEntry are now stored as array indices. This has the benefit of allowing the garbage collector to ignore the indexEntry objects as these do not contain pointers and are part of a single large allocation.

This data structure reduces the wasted memory to O(sqrt(n)). The top-layer of the hashed array tree (HAT) also has a size of O(sqrt(n)), which makes it cache efficient. The top-layer should be small enough to easily fit into the CPU cache and thus only adds little overhead compared to directly accessing an index entry via a pointer.

Iterating through the indexmap according to the bucket order has the problem that all indexEntries are accessed in random order which is rather cache inefficient. As we already keep a list of all allocated blocks, just iterate through it. This allows iterating through a batch of indexEntries without random memory accesses. In addition, the packID will likely remain similar across multiple blobs as all blobs of a pack file are added as a single batch.

MichaelEischer · 2023-06-02T17:17:24Z

To complement the above measurements, here are the number of a full system backup dry-run with 140GB in 1 Million files (5 million blobs total in the repository):

master - 106,02s user 12,96s system 265% cpu 44,855 total
pointerless-index - 87,52s user 12,64s system 232% cpu 43,112 total

That is, with this PR the backup is faster and requires 17% less CPU! For larger repositories, the improvement is likely even larger.

The corresponding output of setting GODEBUG=gctrace=1 is:

master - gc 30 @33.561s 0%: 0.032+54+0.017 ms clock, 0.52+0.12/204/519+0.27 ms cpu, 927->929->478 MB, 945 MB goal, 0 MB stacks, 0 MB globals, 16 P
pointerless-index - gc 33 @33.573s 0%: 0.032+0.70+0.020 ms clock, 0.52+0.027/1.7/1.9+0.33 ms cpu, 933->933->469 MB, 937 MB goal, 0 MB stacks, 0 MB globals, 16 P

The gc mark and sweep phase duration drops from 0.12/204/519 ms to 0.027/1.7/1.9 ms (assist/background/idle), essentially making the GC for free.

internal/index/indexmap.go

This reverts commit f1c388c. For an uninitialized indexmap the returned size was `-1` which is unexpected and could cause problems.

MichaelEischer · 2023-06-16T21:14:08Z

LGTM.

MichaelEischer added 8 commits May 28, 2023 23:55

repository: Fix panic in benchmarkLoadIndex

ffca602

index: add garbage collection benchmark

0c12403

Allocates an index and repeatedly triggers the GC.

index: Allow inlining of HAT

12141af

index: remove redundant storage of indexmap size

f1c388c

index: implement indexmap.grow() without random access

9a7056a

DRON-666 reviewed Jun 2, 2023

View reviewed changes

internal/index/indexmap.go Outdated Show resolved Hide resolved

index: fix blocklist size

ac1dfc9

This was referenced Jun 2, 2023

Tune Go garbage collector #4353

Merged

Reduce prune memory usage #4354

Merged

MichaelEischer added 2 commits June 8, 2023 18:08

Revert "index: remove redundant storage of indexmap size"

55c2184

This reverts commit f1c388c. For an uninitialized indexmap the returned size was `-1` which is unexpected and could cause problems.

index: add basic hat test

b2ed42c

MichaelEischer merged commit dd1ef13 into restic:master Jun 16, 2023

MichaelEischer deleted the pointerless-index branch June 16, 2023 21:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index: optimize go GC performance#4352

index: optimize go GC performance#4352
MichaelEischer merged 11 commits intorestic:masterfrom
MichaelEischer:pointerless-index

MichaelEischer commented Jun 2, 2023 •

edited

Loading

Uh oh!

MichaelEischer commented Jun 2, 2023 •

edited

Loading

Uh oh!

Uh oh!

MichaelEischer commented Jun 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MichaelEischer commented Jun 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR change? What problem does it solve?

Results

Summary

Was the change previously discussed in an issue or on the forum?

Checklist

Uh oh!

MichaelEischer commented Jun 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

MichaelEischer commented Jun 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MichaelEischer commented Jun 2, 2023 •

edited

Loading

MichaelEischer commented Jun 2, 2023 •

edited

Loading