Skip to content

make Lookup() return all blobs#2789

Merged
MichaelEischer merged 2 commits intorestic:masterfrom
aawsome:fix-lookup
Jul 25, 2020
Merged

make Lookup() return all blobs#2789
MichaelEischer merged 2 commits intorestic:masterfrom
aawsome:fix-lookup

Conversation

@aawsome
Copy link
Copy Markdown
Contributor

@aawsome aawsome commented Jun 14, 2020

What is the purpose of this change? What does it change?

Change repository.Index.Lookup() to return all results.

Currently, Lookup returns all results including duplicates within the first index file with first match; it however does not return all duplicates. This PR "corrects" Lookup to return all duplicates.

Was the change discussed in an issue or in the forum before?

see #2523

Checklist

  • I have read the Contribution Guidelines
  • I have enabled maintainer edits for this PR
  • I have added tests for all changes in this PR (and some more)
  • I have not added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

@greatroar
Copy link
Copy Markdown
Contributor

Great! I was wanting to do this for #2788, but settled on a more local patch.

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought a bit more about the effects of adding Lookup and LookupAll. The main difference is probably that Repository.LoadBlob now has to traverse the full index, instead of only half on average. This is probably most noticeable for the check command which issues a massive amount of LoadBlob calls (via LoadTree).

Just to get an idea of how many Lookup calls are issued by check: I've used PR #2328 (check optimization), which already dramatically reduces the number of LoadBlob calls, on my large test repository which has a 7GB index with 729 files. LoadBlob spends roughly 40% of its runtime in the Lookup call (which amounts to more than half an hour CPU time). Using LookupAll would increase that to nearly 60%. That probably sounds worse than it actually is as the index lookup only amounts to 6% of the overall runtime, so that the slowdown should be tolerable.

For another repository using the standard check implementation the Lookup/LookupAll call also only requires a small part of the overall runtime.

Regarding LookupBlobSize I would be interested to see a small performance benchmark similar to the BenchmarkMasterIndexLookupMultipleIndex benchmark that already exists for the master index.

I think this PR also needs a test that Lookup and LookupAll really work as expected. The adapted tests probably don't completely cover the new semantics, especially of LookupAll.

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Jun 18, 2020

@MichaelEischer Maybe we should first talk about #2799 which should solve the "many index files" issue before continuing to work on this?

However, I'll take care of the points you mentioned!

@greatroar
Copy link
Copy Markdown
Contributor

Closes #2701.

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Jul 6, 2020

Regarding LookupBlobSize I would be interested to see a small performance benchmark similar to the BenchmarkMasterIndexLookupMultipleIndex benchmark that already exists for the master index.

I think this PR also needs a test that Lookup and LookupAll really work as expected. The adapted tests probably don't completely cover the new semantics, especially of LookupAll.

I added this benchmark and tests. All code in Lookup and LookupAll (both in index.go and master_index.go) is now covered by the tests.

@greatroar
Copy link
Copy Markdown
Contributor

LookupBlobSize isn't covered, though. The restorer tests don't hit it either (low coverage), nor do the fuse ones (they use the blobSizeCache).

As an aside, it's funny how the MasterIndex is now the only type that implements restic.Index and everything still works.

@aawsome aawsome force-pushed the fix-lookup branch 2 times, most recently from 1563e74 to 50366da Compare July 6, 2020 20:22
@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Jul 6, 2020

LookupBlobSize isn't covered, though. The restorer tests don't hit it either (low coverage), nor do the fuse ones (they use the blobSizeCache).

I added a bunch of easy-to-add tests to improve test coverage.

As an aside, it's funny how the MasterIndex is now the only type that implements restic.Index and everything still works.

😃 Yes - in fact, internal/repository/index.Index doesn't need to implement restic.Index.

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just took another look at this PR. Besides tests there's just a single user of Lookup and that is inside cmd_cat.go which doesn't matter for performance as it's just a single call. All other uses which just need a single result blob seem to be covered by LookupSize.

So lets keep this simple: please drop the Lookup method and rename LookupAll to Lookup.

@aawsome aawsome changed the title make Lookup() really return just one blob, add LookupAll() make Lookup() return all blobs Jul 25, 2020
@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Jul 25, 2020

I just took another look at this PR. Besides tests there's just a single user of Lookup and that is inside cmd_cat.go which doesn't matter for performance as it's just a single call. All other uses which just need a single result blob seem to be covered by LookupSize.

So lets keep this simple: please drop the Lookup method and rename LookupAll to Lookup.

You are right and especially after #2818 is merged, there shouldn't be much performance issues any more.
So this PR is now merely a code clean.

I rebased and made your suggested changes.

Seems, we have a CI issue ATM...

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a few further small issues. Once these are fixed, the PR should be ready for merging.

aawsome and others added 2 commits July 25, 2020 21:18
The interface is now only implemented by repository.MasterIndex.
Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the cleanup!

@MichaelEischer MichaelEischer merged commit 573a2fb into restic:master Jul 25, 2020
@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Jul 25, 2020

@MichaelEischer Wow you are really working through the PRs today! Thanks a lot for all your work! I really appreciate a lot what you are supplying here! Also know about the burden of "the one who has to make the decission"...
👍

I really like the progress restic is making. Wondering whether all the improvements done in the last months would justify a new release.. 😉

@rawtaz
Copy link
Copy Markdown
Contributor

rawtaz commented Jul 25, 2020

LOL, you don't think we have thought about a new release? :P

Thing is, it's not just you guys' PRs and work that should go into a new release, but most of the developer time (primarily Michael's) is going to that instead of older PRs and stuff ;)

I too think the last few months have given a lot of improvement to restic, in part thanks to your contributions!

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Jul 26, 2020

LOL, you don't think we have thought about a new release? :P

Thing is, it's not just you guys' PRs and work that should go into a new release, but most of the developer time (primarily Michael's) is going to that instead of older PRs and stuff ;)

Sure.. is there a roadmap what points you maintainers see as important milestones or priorities towards a next release? (If there is, I just didn't find it so far). Please give me a hint if I can help with something.

@rawtaz
Copy link
Copy Markdown
Contributor

rawtaz commented Jul 26, 2020

Thank you @aawsome, but I was mostly thinking about having mentioned an upcoming release in a few places by now. But it's been like that for a while. We mostly have discussions amongst ourselves as part of the ongoing work to get a new release out, there's no roadmap I can point you to. It's done when it's done, and that shouldn't be too far into the future :) Thanks again!

@aawsome aawsome deleted the fix-lookup branch December 7, 2020 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants