make Lookup() return all blobs#2789
Conversation
|
Great! I was wanting to do this for #2788, but settled on a more local patch. |
MichaelEischer
left a comment
There was a problem hiding this comment.
I've thought a bit more about the effects of adding Lookup and LookupAll. The main difference is probably that Repository.LoadBlob now has to traverse the full index, instead of only half on average. This is probably most noticeable for the check command which issues a massive amount of LoadBlob calls (via LoadTree).
Just to get an idea of how many Lookup calls are issued by check: I've used PR #2328 (check optimization), which already dramatically reduces the number of LoadBlob calls, on my large test repository which has a 7GB index with 729 files. LoadBlob spends roughly 40% of its runtime in the Lookup call (which amounts to more than half an hour CPU time). Using LookupAll would increase that to nearly 60%. That probably sounds worse than it actually is as the index lookup only amounts to 6% of the overall runtime, so that the slowdown should be tolerable.
For another repository using the standard check implementation the Lookup/LookupAll call also only requires a small part of the overall runtime.
Regarding LookupBlobSize I would be interested to see a small performance benchmark similar to the BenchmarkMasterIndexLookupMultipleIndex benchmark that already exists for the master index.
I think this PR also needs a test that Lookup and LookupAll really work as expected. The adapted tests probably don't completely cover the new semantics, especially of LookupAll.
|
@MichaelEischer Maybe we should first talk about #2799 which should solve the "many index files" issue before continuing to work on this? However, I'll take care of the points you mentioned! |
|
Closes #2701. |
I added this benchmark and tests. All code in |
|
LookupBlobSize isn't covered, though. The restorer tests don't hit it either (low coverage), nor do the fuse ones (they use the blobSizeCache). As an aside, it's funny how the MasterIndex is now the only type that implements restic.Index and everything still works. |
1563e74 to
50366da
Compare
I added a bunch of easy-to-add tests to improve test coverage.
😃 Yes - in fact, |
MichaelEischer
left a comment
There was a problem hiding this comment.
I just took another look at this PR. Besides tests there's just a single user of Lookup and that is inside cmd_cat.go which doesn't matter for performance as it's just a single call. All other uses which just need a single result blob seem to be covered by LookupSize.
So lets keep this simple: please drop the Lookup method and rename LookupAll to Lookup.
You are right and especially after #2818 is merged, there shouldn't be much performance issues any more. I rebased and made your suggested changes. Seems, we have a CI issue ATM... |
MichaelEischer
left a comment
There was a problem hiding this comment.
I found a few further small issues. Once these are fixed, the PR should be ready for merging.
+ simplify syntax
The interface is now only implemented by repository.MasterIndex.
e753498 to
c847aac
Compare
MichaelEischer
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the cleanup!
|
@MichaelEischer Wow you are really working through the PRs today! Thanks a lot for all your work! I really appreciate a lot what you are supplying here! Also know about the burden of "the one who has to make the decission"... I really like the progress restic is making. Wondering whether all the improvements done in the last months would justify a new release.. 😉 |
|
LOL, you don't think we have thought about a new release? :P Thing is, it's not just you guys' PRs and work that should go into a new release, but most of the developer time (primarily Michael's) is going to that instead of older PRs and stuff ;) I too think the last few months have given a lot of improvement to restic, in part thanks to your contributions! |
Sure.. is there a roadmap what points you maintainers see as important milestones or priorities towards a next release? (If there is, I just didn't find it so far). Please give me a hint if I can help with something. |
|
Thank you @aawsome, but I was mostly thinking about having mentioned an upcoming release in a few places by now. But it's been like that for a while. We mostly have discussions amongst ourselves as part of the ongoing work to get a new release out, there's no roadmap I can point you to. It's done when it's done, and that shouldn't be too far into the future :) Thanks again! |
What is the purpose of this change? What does it change?
Change
repository.Index.Lookup()to return all results.Currently,
Lookupreturns all results including duplicates within the first index file with first match; it however does not return all duplicates. This PR "corrects"Lookupto return all duplicates.Was the change discussed in an issue or in the forum before?
see #2523
Checklist
changelog/unreleased/that describes the changes for our users (template here)gofmton the code in all commits