Use low memory map for in-memory index - about 45% less memory for index#2811
Use low memory map for in-memory index - about 45% less memory for index#2811aawsome wants to merge 4 commits intorestic:masterfrom
Conversation
|
Interesting. For comparison, I've been trying custom hash tables as well, but started with a simple chaning one. That also seemed to solve the memory use issue, but may come with some subtle performance issues: sequential performance was down between 8% and 31% on the benchmarks, and I could see a second core running the GC at 100%, so concurrent performance might be worse. |
|
I put up my chaining version as an alternative in #2812 because its allocation rate is comparable, but it's lookup speed is a lot better than this version. It also admits merging more easily, I think, if we still want to try that some day. |
Your linked list looks quite promising (and IMO memory consumption could be much further decreased by not using a separate "duplicates") I'm however a bit irritated: my approach should lead to at most around 5 bytes overhead per index entry, while your approach (with separate duplicates) should yield 32 extra bytes. Or I did miss some issues within my implementation... |
|
allocs/op is the number of calls to the memory allocator. alloc/op is the total amount of memory requested, not the peak memory usage. I'm currently trying to benchmark both PRs against master on some realistic tasks, and I'm getting conflicting results: sometimes mine wins, sometimes yours. Throughput isn't much different. Do you know what restic command is most likely to allocate indexes and not much else? You're right about the duplicates handling, that's the part I couldn't quite get right. |
|
Also, I don't really get your 5 vs. 32 bytes comment, care to elaborate? |
I would say
IMO you should just include them into your chained list (aka linked list) and implement a |
The entry size for this PR is 32 bytes + 16 bytes = 48 bytes. (49 bytes for your approach as you also save the BlobType) For this PR: on average 1/2 byte of overhead per index entry + 1/2 unused page per 48 total pages. 1/2 page is 16 * 48 bytes and 48 pages will have around 32 * 48 entries. This is another 2 bytes overhead. So I actually expect 2,5 bytes overhead per entry but worst case will be about double that amount. For your approach I just took into account the extra pointer (8 bytes) and the duplicates slice (24 bytes). This is already 32 bytes per entry. On addition there is the overhead of managing the buckets which is 8 bytes per bucket and on average 2/3 buckets per entry. So total overhead should be 37 bytes on average. If you remove the duplicates slice, it will still be 13 bytes per entry on average and a bit more in the worst case. |
|
I think you're missing some bytes of malloc overhead. The Go malloc, IIIRC, allocates in 64-byte multiples, so your buckets are bigger than you think. My use of a free list takes care of some of that. Anyway, I found a way to store the duplicates in a separate list and get the indexEntry down to 64 bytes. Debugging as we speak :) |
I think it is either 8 bytes or 16 byte multiples up to 288 bytes or 32 byte multiples up to 512 bytes. EDIT: 16 bytes multiples up to 256 bytes 😉 |
|
Judging from the malloc code, you're right about the tiny sizes. Good to know. |
internal/repository/index.go
Outdated
| packs restic.IDs | ||
| treePacks restic.IDs | ||
| m sync.Mutex | ||
| byType map[restic.BlobType]*lowMemMap |
There was a problem hiding this comment.
This little map may well be a performance killer on the benchmark. In #2812, I inline two lazily-initialized header structures in the Index.
There was a problem hiding this comment.
It didn't contribute too much to the performance. However, using byType []*lowMemMap is the way better option. This can be directly accessed with byType[tpe] where tpe is restic.BlobType.
I changed the implementation, so that you can use this as reference for modifying your PR (also made a comment there).
|
After also adding a linked list variant (approach of @greatroar) , IMO that approach should be used. I change this to draft for reference and will close it as soon as #2812 is merged. |
|
closed as #2812 is merged. |
UPDATE: This PR is only left for reference; IMO #2812 should be followed which gives slightly worse memory results but much better handling and performance.
What is the purpose of this change? What does it change?
Use a low memory map for the in-memory index.
In fact there are a few changes which come with this PR:
For more details about the low-memory map, see
internal/repository/index_inmem.go.Here is the comparison to master (where #2781 is already merged!) of the repository benchmarks:
As expected, some timings are getting worse, but these are internal timings and I didn't see impacts with restic commands (but didn't measure these, either). Note the
-45.28%on memory usage.Also note, that with this PR huge index speed improvements are possible when merging the indexes, see #2799 .
Was the change discussed in an issue or in the forum before?
The ideas come from #2523 and #2799.
Checklist
changelog/unreleased/that describes the changes for our users (template here)gofmton the code in all commits