Skip to content

Merge index (based on chaining index implementation)#2818

Merged
MichaelEischer merged 3 commits intorestic:masterfrom
aawsome:merge-index-chaining
Jul 22, 2020
Merged

Merge index (based on chaining index implementation)#2818
MichaelEischer merged 3 commits intorestic:masterfrom
aawsome:merge-index-chaining

Conversation

@aawsome
Copy link
Copy Markdown
Contributor

@aawsome aawsome commented Jul 4, 2020

What is the purpose of this change? What does it change?

Merge in-memory indexes when they are finalized.
This increases performance especially for large repositories with many index files.

Was the change discussed in an issue or in the forum before?

The idea comes from #2523. This is a follow-up of #2799 as the discussion in there mainly moved to index-implementation related things.

This PR is based on #2812.

closes #944
closes #1550
closes #2799
closes #2523

Checklist

  • I have read the Contribution Guidelines
  • I have enabled maintainer edits for this PR
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

@aawsome aawsome force-pushed the merge-index-chaining branch from 219feff to 931048a Compare July 4, 2020 05:42
@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Jul 4, 2020

Here are benchmark results, where old is based on commit bb65143.

name                                         old time/op    new time/op    delta
PackerManager-4                                 265ms ± 2%     267ms ± 3%     ~     (p=0.113 n=9+10)
DecodeIndex-4                                  22.3µs ± 4%    22.4µs ± 2%     ~     (p=0.353 n=10+10)
IndexHasUnknown-4                              74.4ns ±17%    77.2ns ±13%     ~     (p=0.342 n=10+10)
IndexHasKnown-4                                72.0ns ± 4%    73.9ns ±11%     ~     (p=0.200 n=9+10)
IndexAlloc-4                                    900ms ± 2%     891ms ± 5%     ~     (p=0.218 n=10+10)
MasterIndexLookupSingleIndex-4                  229ns ± 6%     259ns ± 3%  +13.14%  (p=0.000 n=10+10)
MasterIndexLookupMultipleIndex-4                511ns ± 6%     252ns ± 7%  -50.60%  (p=0.000 n=9+10)
MasterIndexLookupSingleIndexUnknown-4          97.8ns ±10%   125.2ns ±13%  +28.07%  (p=0.000 n=10+10)
MasterIndexLookupMultipleIndexUnknown-4         420ns ±11%     101ns ± 9%  -75.92%  (p=0.000 n=10+10)
MasterIndexLookupMultipleIndexReal-4           10.5µs ± 6%     0.3µs ± 7%  -97.52%  (p=0.000 n=10+9)
MasterIndexLookupMultipleIndexRealUnknown-4    10.2µs ± 4%     0.1µs ±11%  -99.02%  (p=0.000 n=10+10)
SaveAndEncrypt-4                                136ns ± 2%     137ns ± 2%     ~     (p=0.230 n=9+10)
LoadTree-4                                      442µs ± 2%     467µs ± 1%   +5.78%  (p=0.000 n=10+10)
LoadBlob-4                                     8.23ms ± 0%    8.34ms ± 1%   +1.36%  (p=0.000 n=9+10)
LoadAndDecrypt-4                               10.2ms ± 2%    10.2ms ± 2%     ~     (p=0.436 n=9+9)
LoadIndex-4                                    43.8ms ± 3%    43.5ms ± 4%     ~     (p=0.631 n=10+10)

name                                         old speed      new speed      delta
PackerManager-4                               199MB/s ± 2%   197MB/s ± 3%     ~     (p=0.108 n=9+10)
SaveAndEncrypt-4                             30.9TB/s ± 2%  30.7TB/s ± 2%     ~     (p=0.243 n=9+10)
LoadBlob-4                                    121MB/s ± 0%   120MB/s ± 1%   -1.34%  (p=0.000 n=9+10)
LoadAndDecrypt-4                             98.5MB/s ± 2%  98.1MB/s ± 2%     ~     (p=0.449 n=9+9)

name                                         old alloc/op   new alloc/op   delta
PackerManager-4                                91.0kB ± 0%    91.0kB ± 0%     ~     (p=0.510 n=9+10)
IndexAlloc-4                                    400MB ± 0%     400MB ± 0%     ~     (p=0.579 n=10+10)
SaveAndEncrypt-4                                2.00B ± 0%     2.00B ± 0%     ~     (all equal)

name                                         old allocs/op  new allocs/op  delta
PackerManager-4                                 1.41k ± 0%     1.41k ± 0%     ~     (p=1.000 n=10+10)
IndexAlloc-4                                    1.12M ± 0%     1.12M ± 0%     ~     (p=0.516 n=10+10)
SaveAndEncrypt-4                                 0.00           0.00          ~     (all equal)

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Jul 4, 2020

Also checked test coverage for the changes due to this PR: only the grow() in moveto is not called in tests, but is called in the benchmarks. All other changes are covered by tests.

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Jul 17, 2020

I rebased to the current version of #2812 and fixed the issues @greatroar found.

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree that the masterIndex has to merge indexes to achieve good performance. However, the merge procedure should not break final indexes that got merged into the main index. As there's no way to ensure to nobody still has a reference to a merged index, this can lead to all sorts of subtle race conditions. So please remove all modifications of the source index during a merge operation.

@aawsome aawsome force-pushed the merge-index-chaining branch from 29ba8b6 to 7867a59 Compare July 18, 2020 19:24
@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Jul 18, 2020

Now that this PR only calls indexMap.add() directly, the benchmarks need an update - coming soon.

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good on a quick glance at the code. I'll wait with the final review until #2812 is merged.

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Jul 19, 2020

The benchmarks got killed on my 8GB laptop due to unsufficient memory; however, the important benchmarks did run. Here is the result comparing old = 6159eab with new = 1f07c59 :

name                                             old time/op    new time/op    delta
IndexMapHash-4                                     5.88µs ± 2%    5.89µs ± 1%     ~     (p=0.827 n=10+9)
PackerManager-4                                     270ms ± 1%     270ms ± 1%     ~     (p=0.666 n=9+9)
DecodeIndex-4                                       5.45s ± 1%     5.45s ± 3%     ~     (p=0.529 n=10+10)
DecodeIndexParallel-4                               5.43s ± 0%     5.43s ± 1%     ~     (p=0.905 n=9+10)
IndexHasUnknown-4                                  76.1ns ±12%    76.5ns ±14%     ~     (p=0.896 n=10+10)
IndexHasKnown-4                                    76.4ns ±10%    74.7ns ±11%     ~     (p=0.591 n=10+10)
IndexAlloc-4                                        848ms ± 1%     852ms ± 2%     ~     (p=0.203 n=8+10)
IndexAllocParallel-4                                513ms ± 3%     514ms ± 2%     ~     (p=0.666 n=9+9)
MasterIndexLookupSingleIndex-4                      198ns ± 6%     198ns ± 6%     ~     (p=0.984 n=9+10)
MasterIndexLookupMultipleIndex-4                   10.3µs ± 3%     0.2µs ± 9%  -97.67%  (p=0.000 n=10+10)
MasterIndexLookupSingleIndexUnknown-4               104ns ±16%     106ns ±22%     ~     (p=0.699 n=10+10)
MasterIndexLookupMultipleIndexUnknown-4            10.0µs ± 3%     0.1µs ± 8%  -99.05%  (p=0.000 n=10+10)
MasterIndexLookupParallel/known,indices=100-4      4.89µs ± 1%    0.50µs ± 1%  -89.85%  (p=0.000 n=10+10)
MasterIndexLookupParallel/unknown,indices=100-4    4.55µs ± 1%    0.31µs ± 7%  -93.25%  (p=0.000 n=8+10)

name                                             old speed      new speed      delta
IndexMapHash-4                                    697MB/s ± 2%   693MB/s ± 3%     ~     (p=0.579 n=10+10)
PackerManager-4                                   195MB/s ± 1%   195MB/s ± 1%     ~     (p=0.666 n=9+9)

name                                             old alloc/op   new alloc/op   delta
IndexMapHash-4                                      0.00B          0.00B          ~     (all equal)
PackerManager-4                                    91.0kB ± 0%    91.0kB ± 0%     ~     (p=1.000 n=10+10)
IndexAlloc-4                                        401MB ± 0%     401MB ± 0%     ~     (all equal)
IndexAllocParallel-4                                400MB ± 0%     400MB ± 0%     ~     (all equal)

name                                             old allocs/op  new allocs/op  delta
IndexMapHash-4                                       0.00           0.00          ~     (all equal)
PackerManager-4                                     1.41k ± 0%     1.41k ± 0%     ~     (p=1.000 n=10+10)
IndexAlloc-4                                         907k ± 0%      907k ± 0%     ~     (all equal)
IndexAllocParallel-4                                 907k ± 0%      907k ± 0%     ~     (all equal)

@aawsome aawsome force-pushed the merge-index-chaining branch from 0daff59 to 1f07c59 Compare July 19, 2020 18:32
@MichaelEischer
Copy link
Copy Markdown
Member

The benchmarks got killed on my 8GB laptop due to unsufficient memory; however, the important benchmarks did run.

Insufficient memory on a laptop with 8GB RAM sounds a bit worrying, as in that case either the tests use too large indexes or have a memory leak. Which test failed exactly? FWIW the benchmarks completed successfully on my laptop and that also has "just" 8GB RAM.

+ reduce test size of BenchmarkMasterIndexLookupParallel
@aawsome aawsome force-pushed the merge-index-chaining branch from 1f07c59 to d18c65f Compare July 21, 2020 07:31
@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Jul 21, 2020

The benchmarks got killed on my 8GB laptop due to unsufficient memory; however, the important benchmarks did run.

Insufficient memory on a laptop with 8GB RAM sounds a bit worrying, as in that case either the tests use too large indexes or have a memory leak. Which test failed exactly? FWIW the benchmarks completed successfully on my laptop and that also has "just" 8GB RAM.

I had some other programs in memory which also took their share...

The memory-hungry benchmark was MasterIndexLookupParallel which simulates 1M, 2M and 4M pack files (the other tests had a maximum of ~1M pack files). I changed it such that it simulates 250k, 500k and 1M pack files.

Here are the benchmark results:

name                                             old time/op    new time/op    delta
IndexMapHash-4                                     5.77µs ± 2%    5.90µs ± 2%   +2.38%  (p=0.000 n=10+10)
PackerManager-4                                     267ms ± 2%     267ms ± 1%     ~     (p=0.720 n=10+9)
DecodeIndex-4                                       5.39s ± 0%     5.43s ± 1%   +0.75%  (p=0.015 n=8+9)
DecodeIndexParallel-4                               5.39s ± 1%     5.41s ± 1%   +0.40%  (p=0.019 n=10+10)
IndexHasUnknown-4                                  79.0ns ±26%    73.3ns ±12%     ~     (p=0.243 n=10+9)
IndexHasKnown-4                                    75.3ns ±10%    75.6ns ±15%     ~     (p=0.897 n=10+10)
IndexAlloc-4                                        837ms ± 1%     890ms ± 5%   +6.32%  (p=0.000 n=9+10)
IndexAllocParallel-4                                450ms ± 3%     511ms ± 1%  +13.54%  (p=0.000 n=9+9)
MasterIndexLookupSingleIndex-4                      203ns ± 5%     196ns ± 5%   -3.60%  (p=0.030 n=10+10)
MasterIndexLookupMultipleIndex-4                   10.1µs ± 5%     0.2µs ±10%  -97.54%  (p=0.000 n=10+10)
MasterIndexLookupSingleIndexUnknown-4               100ns ±11%     104ns ± 9%     ~     (p=0.092 n=10+10)
MasterIndexLookupMultipleIndexUnknown-4            9.94µs ± 5%    0.09µs ± 7%  -99.06%  (p=0.000 n=10+9)
MasterIndexLookupParallel/known,indices=25-4       1.50µs ± 2%    0.44µs ± 2%  -70.43%  (p=0.000 n=10+10)
MasterIndexLookupParallel/unknown,indices=25-4     1.06µs ± 1%    0.30µs ± 1%  -71.17%  (p=0.000 n=10+9)
MasterIndexLookupParallel/known,indices=50-4       2.45µs ± 1%    0.47µs ± 3%  -80.73%  (p=0.000 n=9+9)
MasterIndexLookupParallel/unknown,indices=50-4     2.14µs ± 1%    0.30µs ± 6%  -85.90%  (p=0.000 n=10+10)
MasterIndexLookupParallel/known,indices=100-4      4.92µs ± 0%    0.46µs ± 5%  -90.57%  (p=0.000 n=8+10)
MasterIndexLookupParallel/unknown,indices=100-4    4.71µs ± 1%    0.31µs ± 2%  -93.39%  (p=0.000 n=9+10)
SaveAndEncrypt-4                                   42.7ms ± 1%    43.6ms ± 1%   +2.06%  (p=0.000 n=9+10)
LoadTree-4                                          437µs ± 3%     441µs ± 2%     ~     (p=0.065 n=9+10)
LoadBlob-4                                         8.22ms ± 0%    8.43ms ± 1%   +2.54%  (p=0.000 n=8+9)
LoadAndDecrypt-4                                   8.58ms ± 1%    8.58ms ± 0%     ~     (p=0.356 n=10+9)
LoadIndex-4                                        32.7ms ± 1%    33.6ms ± 1%   +2.93%  (p=0.000 n=10+10)

name                                             old speed      new speed      delta
IndexMapHash-4                                    710MB/s ± 2%   694MB/s ± 2%   -2.33%  (p=0.000 n=10+10)
PackerManager-4                                   197MB/s ± 2%   198MB/s ± 1%     ~     (p=0.720 n=10+9)
SaveAndEncrypt-4                                 98.2MB/s ± 1%  96.3MB/s ± 1%   -2.02%  (p=0.000 n=9+10)
LoadBlob-4                                        122MB/s ± 0%   119MB/s ± 1%   -2.48%  (p=0.000 n=8+9)
LoadAndDecrypt-4                                  117MB/s ± 1%   117MB/s ± 0%     ~     (p=0.344 n=10+9)

name                                             old alloc/op   new alloc/op   delta
IndexMapHash-4                                      0.00B          0.00B          ~     (all equal)
PackerManager-4                                    91.1kB ± 0%    91.1kB ± 0%     ~     (all equal)
IndexAlloc-4                                        401MB ± 0%     401MB ± 0%     ~     (all equal)
IndexAllocParallel-4                                400MB ± 0%     400MB ± 0%   +0.00%  (p=0.000 n=9+10)
SaveAndEncrypt-4                                   21.0MB ± 0%    21.0MB ± 0%   +0.00%  (p=0.000 n=10+9)

name                                             old allocs/op  new allocs/op  delta
IndexMapHash-4                                       0.00           0.00          ~     (all equal)
PackerManager-4                                     1.41k ± 0%     1.41k ± 0%     ~     (all equal)
IndexAlloc-4                                         907k ± 0%      907k ± 0%     ~     (all equal)
IndexAllocParallel-4                                 907k ± 0%      907k ± 0%   +0.00%  (p=0.000 n=9+10)
SaveAndEncrypt-4                                      118 ± 0%       118 ± 0%     ~     (all equal)

(wouldn't count too much on the ~2% differences, as said this is my laptop and there are other things like a firefox running...)

@MichaelEischer
Copy link
Copy Markdown
Member

I've run check for a large repository which ran for 3 hours in total and the index lookup only took only 1.3% of the overall runtime. The garbage collector took 8.9% of the overall runtime which seems to be similar to before.

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Two are better than one ^^ .

@MichaelEischer MichaelEischer merged commit bcd47ec into restic:master Jul 22, 2020
@aawsome aawsome mentioned this pull request Jul 25, 2020
7 tasks
@aawsome aawsome deleted the merge-index-chaining branch July 25, 2020 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants