Fix non-intuitive repo behavior#2773
Conversation
MichaelEischer
left a comment
There was a problem hiding this comment.
I like the cleanups of this PR. For the method names I've suggested a few alternatives, which I'd prefer over the current ones. CheckSetKnown and the existing SaveFullIndex need some fixes for race conditions before this can land, but besides these (and some minor cleanups) I don't see any other blockers.
Inlining the intermediate index upload into the packerManager is actually much simpler than what I had thought about, but it nevertheless looks like a good solution.
7b298a9 to
14bd06f
Compare
@MichaelEischer Thank you very much for your suggestions - I especially like the naming which is IMO now much clearer (and also allowed for some clearer code). I changed everything except that I still kept |
14bd06f to
671537e
Compare
|
I didn't check for race conditions, but at a glance this PR looks very clean. Nice work! |
671537e to
4509c58
Compare
Thank you for your feedback! I changed all mentioned issues. |
MichaelEischer
left a comment
There was a problem hiding this comment.
The PR is nearly ready to go, I've just found a few further minor issue. There's also a left-over call to SaveIndex in cmd_recover.go which should be removed.
Please add a note to the commit message that the commit removes the verbose "uploaded intermediate index" messages. Also a short explanation of the "why" of the commit might be useful, in case someone stumbles across the change later on. The hint at the race conditions should be a bit more specific, just saying that there are some race condition in this 400 lines diff makes it hard to find the problematic spots.
For future PRs: Please split your changes into smaller steps, a single commit that changes at least 5 different things (merged flush, knownBlobs, StorePacks, replacement of the index uploader, the finalization race condition) at once is too large (it should still be a single PR, just with some intermediate steps). It's way easier to review 5 commits with 80 lines diff than a single one with 400 lines.
|
We probably also want a short changelog entry to highlight the memory usage improvements of the change to |
4509c58 to
1a7f3c0
Compare
|
@MichaelEischer Thanks for reviewing and thank you very much for your valuable hints. I also realized that the changes were larger than originally expected. In fact I just wanted to change the "pending blobs" and automatic saving of intermediate index files. When doing the changes I realized that changing |
|
@aawsome Please add a changelog entry that we can check as well :) |
- The SaveBlob method now checks for duplicates. - Moves handling of pending blobs to MasterIndex. -> also cleans up pending index entries when they are saved in the index -> when using SaveBlob no need to care about index any longer - Always check for full index and save it when storing packs. -> removes the need of an index uploader -> also removes the verbose "uploaded intermediate index" messages - The Flush method now also saves the index - Fix race condition when checking and saving full/non-finalized indexes
1a7f3c0 to
9190691
Compare
ups - sorry, forgot to add this in the commit. Is now in the PR. |
MichaelEischer
left a comment
There was a problem hiding this comment.
I noticed that I did overlook a side-effect on the prune command: repository.Repack previously did not write index files, which has changed now. For the current prune implementation this means that several index files will be uploaded and afterwards deleted again during the final index rebuild. The cleanest simple solution I see at the moment is to add a DisableAutomaticIndexUpload method to the Repository, which would pass that flag on to the packer manager. The disable call would then happen somewhere in the prune command before the call to Repack. I explicitly want to avoid any structural changes of the prune command.
For creating my git commits, I usually selectively add changed lines to rather small git commits right from the start. Rearranging and combining the commits is then possible with git rebase -i. Using the latter command it's also possible to split a commit into multiple parts, but that feels rather cumbersome to me. For learning a lot about git I can recommend the book that's freely available on the official git scm website.
There's also
I love this command. |
+ modifications of changelog
I also realized that this would happen and came to the conclusion that it shouldn't harm: at the end Thanks @MichaelEischer and @rawtaz for your suggestions about git. I'm actually quite new to working with git and again learned something new 😄 So far I did not try out the full functionality of |
MichaelEischer
left a comment
There was a problem hiding this comment.
LGTM. Thanks a lot for the cleanups and optimizations!
What is the purpose of this change? What does it change?
The current
restic.Repositoryimplementation has some non-intuitive behavior. While you can usethe rather high-level method
SaveBlobto save a blob it does not check for already saved blobs by default. If you do the check yourself by usingrepo.Index().Has()you encounter the problem that saved blobs in unfinished packs are not yet present in the index and thereforeHas()usually returns false for just saved blobs.Moreover full indexes are not saved by default but need to be manually saved by regularily calling
SaveFullIndex().All these issues are at the moment solved separately in the
internal/archivercodebase but will not work by default for new implementations.This PR moves the needed logic to
repository(and the index implementation therein)Moreover it implements a new method
SavePackinrepository/master_index.gowhich saves all index entry for a whole pack.As a side-effect it reduces the code base and simplifies the
archiverlogic.Note that most code changes in this PR are needed modifications in tests, e.g. to make tests generating duplicate blobs or counting the saved blobs still work.
Was the change discussed in an issue or in the forum before?
See #2523 where the ideas are explained.
This PR is a prerequisite for #2606 (see discussion there)
Checklist
changelog/unreleased/that describes the changes for our users (template here)gofmton the code in all commits