Conversation
d8572c4 to
e314fd9
Compare
|
Is now rebased and contains three commits for each individual change. @MichaelEischer / @rawtaz Thanks for giving the hint about |
e314fd9 to
65c091b
Compare
8f8f9cf to
a7df6bf
Compare
|
There was another issue when saving pointers to the packID array: Every |
MichaelEischer
left a comment
There was a problem hiding this comment.
I actually had implemented a proof-of-concept of rather similar changes in https://github.com/MichaelEischer/restic/commits/opt-hash-idx . However, I never got around to clean-up the code. Feel free to "steal" useful parts.
For deduplicating packIDs to a packIndex, I've also used an additional hashmap during index construction to ensure that each packID is actually only added once. With the StorePack method from #2773 that's probably no longer necessary. See d2086a1 .
The duplicates overflow map is similar to MichaelEischer@8d2fbe5 . You might want to take the BenchmarkIndexAlloc from there. The commit message also contains a rather detailed reasoning how the changes work and why they were made. I can tell from my own experience with e.g. some config file changes from roughly a decade ago which had no explanation at all (and no, the original author didn't remember the why anymore), that it is extremely useful to include a detailed commit message when the reason for the change is not completely obvious.
The conversion to uint32 index/length fields is much less messy than my variant. 👍
a7df6bf to
f6ea096
Compare
Oh, I wasn't aware of this branch - I did steal some parts, thank you 😄
I added this when using
I added the benchmark (thanks for the hint!) together with a detailed comment of the current state in the first commit. I hope all comments are understandable; happy to get feedback or suggestions for improvements! |
MichaelEischer
left a comment
There was a problem hiding this comment.
I have a few more remarks on the comments, but besides that the PR seems finished.
A side remark to the definition of Index.blob: Another possibility would have been to use: blob map[restic.BlobHandle]*indexEntry This would have led to the following sizes: key: 32 + 1 = 33 bytes value: 8 bytes indexEntry: 8 + 4 + 4 = 16 bytes each packID: 32 bytes To save N index entries, we would therefore have needed: N * OF * (33 + 8) bytes + N * 16 + N * 32 bytes / BP = N * 82 bytes More precicely, using a pointer instead of a direct entry is the better memory choice if: OF * 8 bytes + entrysize < OF * entrysize <=> entrysize > 8 bytes * OF/(OF-1) Under the assumption of OF=1.5, this means using pointers would have been the better choice if sizeof(indexEntry) > 24 bytes.
b6d687e to
1361341
Compare
|
Here are the benchmark results on my laptop: before the changes (only including commit 7419844): with changes: |
MichaelEischer
left a comment
There was a problem hiding this comment.
LGTM. Thanks!
Grrr, it seems like Github is unable to just show the raw difference introduced by a force push. However, git diff b6d687e2..1361341c or git range-diff b6d687e2...1361341c (compares the patch series) come to the rescue.
|
Confirmed that this works: restic mount of a 1TB repo used 1.714GiB peak RAM in 1d66bb9, now 1.325GiB, a 22.7% reduction. The discrepancy with the predicted 41% is largely due to the blob size cache. |
|
I see I should bring #2587 into a good shape again 😉 - will rebase it to master soon... |
What is the purpose of this change? What does it change?
Reduce the index memory by only applying small local code changes.
The discussion in #2523 seems to be stalled and I realize that core developers actually do not have time to discuss that many changes to improve index memory.
So I cherry-picked the ideas which will give the best improvement while needing the smallest code modifications.
Actually restic stores index entries in a
mapwhere the key length is 33 bytes and the entry length is 24 bytes (for the slice) + needs 48 bytes per really saved index entry.Assuming a map load factor of 2/3 this gives 134 bytes per index entry.
This PR has three improvements:
uint32instead ofuint. This saves 8 bytes per index entry on 64bit architectures. By this change the pack size is effectively limited to 4GB and restic will now panic if used with larger packfiles.indexEntry. Assuming at least 8 blobs per pack in average this saves at least 20 bytes per index entry. The big advantage of this change is that repos with small files actually suffer a lot from the index memory consumption and this case will benefit much more from this improvement.indexEntry, I changed this.Together with the two other improvements this IMO leads to saving additionally 28 bytes if there exists no duplicates (which should be the usual case).
So using this PR, the memory usage is reduced by a total of 56 bytes per index entry (again under the assumptions) leading to a total reduction of ~41% of the memory used by the index.
Was the change discussed in an issue or in the forum before?
see #2523
May close some of the memory related issues.
Checklist
changelog/unreleased/that describes the changes for our users (template here)gofmton the code in all commits