Conversation
|
Open points are:
|
|
I did not have time yet for a close look at the code, but I'd suggest to leave handling of broken repos (which probably requires some tricky code changes) for a later, separate PR. Adding/Changing a few hundred lines of critical code (i.e. one that could permanently damage repositories) is enough for one PR. The current PR should just check that a repository seems intact before making any changes, like the existing prune implementation. The added feature set compared to the current |
MichaelEischer
left a comment
There was a problem hiding this comment.
As a short summary of the below comments:
- I'd suggest to use a set of command line options like
max-unused-percentage=20,repack-small=true,repack-mixed=trueandrepack-trees-only=false(the assignment denotes the default values) - The handling of duplicate blobs is inconsistent
- For correctness/robustness a few additional sanity check are necessary (I hope that I've found them all), but the most important checks are already in place
- I think that rebuilding the index in memory is the way to go, but that probably requires changes to the archiver/indexUploader which in the end should be split into a separate PR
I'd suggest to defer further parallelization related changes of existing code to a later PR as it's probably a good idea to salvage #2340 .
What is the reason why the command is called cleanup instead of prune?
|
@MichaelEischer Thanks a lot for your very valuable comments!
This has been changed. I used I also added
I hope this is fixed now, see
Most are added. I just realized that I forgot one, this will come soon.
I agree, see above.
👍
It is a historical reason that I had first two, then three commands. I still used a new command to make comparisons of the old |
|
For anyone who wants to try out this PR (please using a copy of your repo!):
Hoping to get feedback from many users! |
|
Thanks for this PR @aawsome
I'm now running some tests on one of my trashy repos, which is ~750gb and has a lot of non-pruned chunks/snapshots in it. Will compare the memory/timing values with dry running prune. Since there are a lot of features under this, I have a small question in the meantime:
Can't clearly see the option for this, is it I am planning to run faster stuff (e.g. index/tree cleanup) more often and keep the IO-heavy stuff (e.g. clear completely unused chunks) later. Of course I am assuming "completely unused" stuff should not require exclusive lock, since they won't be referred anywhere after other clean actions, does this sound right? |
I was talking about
ATM there are no lock-free action. The problem is that the repository must be locked between looking for used blobs (by scanning snapshots and the according trees) and the actual deletion of files. A two-phase deletion can be implemented independently of this PR. I also do not think that your expectations about "faster" and "slower" will be true. If you are able to tolerate some unused space in your repository I expect that a "standard" prune run will be fast enough such that you will not need to think about two runs with different parameters. Of course I am very curious about the results of your tests! Make sure to use |
|
Thanks for the clarification. I guess that'd be helpful, since I can spare some disk and 1.5 seems to be a good compromise. Some notes:
Logs: |
Ah, of course - I forgot to mention that the index rebuilding only takes place if not called with
The reason is that you have a lot completely unused pack files. Those will be always deleted. You just have 229 partly used packs and keeping them all will result in 0.16% wasted size. To see a difference you would have to specify a value smaller than 0.16. Then some packs will be repacked.
Thanks a lot for the log. It is a bit surprising to me that most of the time is used for "collect packs for deletion and repacking" - maybe I should add a progress bar there... Can you make a copy of your repo to run without |
Sorry for late answer, I'll report back in a day (there is some extra-IO on the host currently, which will compromise the tests). But I've copied the repo & have a small question: What would be a good comparison?
|
A good comparison for your repo would be: EDIT: Playing around with the other flags would not affect the result too much, as you have already noticed (will always delete all completely unused pack files). |
Oh, I didn't know that. I hope there is some difference between the commands though: Sometimes I got prune failures with messages like Anyway sounds like you need 3-way comparison so I'll copy the repo once more :) |
|
@seqizz The current
|
|
Starting with commit 9bef5ee I renamed the command to |
|
Alright I've used the last commit, a.k.a. "new prune" versus old one. Both time and memory-wise it looks very good. Attaching the logs. aawsome-prune.txt |
|
@seqizz Thanks for your tests! I suspect that your storage has quite a pretty high bandwidth and low latency. The effect should be even much greater on real "remote" storages. Did you run a If you wish to do further tests, you could do:
|
|
I'm quite confident that the actual state of this PR is ready for a review and can be a valuable improvement to restic! Thanks a lot @MichaelEischer for all your ideas, discussions and comments on early versions! Also thanks @seqizz for doing the performance tests! Now, who do we need to review this? I'm happy to get remarks or ideas what to change from code reviewers! |
|
I made a review myself after leaving the topic for a couple of days. I realized that the |
MichaelEischer
left a comment
There was a problem hiding this comment.
I just took a quite look at the index rebuilding changes, so this is at best a partial review.
18778c0 to
1ca60bc
Compare
|
I'd like to leave one other thing here (while you're all subscribed to this issue): You all did a tremendous job! Not only writing the code, but also trying it out, giving feedback, improving the code, adding documentation, and even thinking about the user interface and which options and knobs to make available to users! While reading through the comments, at one point I thought "oh, this is not a good idea to expose directly to end users, it's way too technical and nobody will understand without reading the code what the option does", and a few comments later there was a discussion and you came to the same conclusion. Unfortunately, I have less spare time to work on restic than I used to have. I'm sure it'll return eventually, but for now I need to select the issues and PRs that I look at in order to advance the project. Just to give you an idea: It literally took me the whole day to read all the comments and dig into the code. I was very thorough because it touches the one place within restic which removes data. I have no idea how long that would have taken if it were not for you all doing the bulk of the work! I'm humbled to be part of such an awesome community! Since @MichaelEischer has requested changes (and now I'm a co-author of this PR), we'll wait for him to look at the new code and approve it, then it can be merged. I forgot that earlier :) |
MichaelEischer
left a comment
There was a problem hiding this comment.
I'm happy with the current code and the latest changes. Thanks a lot everyone for the discussions and @aawsome for the dozens of code iterations.
|
Is there something open here? Just asking as this PR is not merged yet and I'm kind of waiting to check if everything is fine with the dependent PRs after this one is merged... |
|
@aawsome No, it's fine. Our Gophers are currently relocating themselves in order to summon the merge gods, it'll be done in not too long (although perhaps not today). Stay tuned, and thanks from me too! 🚀 |
|
I've just released a new version of restic (so we have a release with the VSS code from #2274 included). I'll now merge this code to master, then we can extensively test it and resolve all remaining bugs over the next weeks :) |
|
Horrah! Thanks so much for spending the time on this, @aawsome and @fd0 and @MichaelEischer.
With the increased time for testing, it might sense to get @aawsome's other PRs in there now-ish too, so these can go out together. |
|
Hey everyone, @aawsome @fd0 @MichaelEischer I see the change log says: "By default, the prune command no longer removes all unused data." As an AWS S3 user, I pay per space used in S3, and the idea of pruning old snapshots is to (obviously) save costs. Thank you very much! |
|
@nunoperalta It's referring to the things you can read about at https://restic.readthedocs.io/en/stable/060_forget.html#customize-pruning , in particular the By default it allows for leaving 5% of unused data instead of repacking that last part, but you can change it to 0 if you want. But please note that it's not just storage costs you have to pay, you also pay for transactions with your S3, right? |
Yep, exactly. Transactions & Data Transfer is what I found to be the most expensive about Prune. So you find that leaving at 5% is "ideal" for a balance/trade-off? Thank you very much. |
Smarter people than me came to that conclusion, and I think it seems reasonable. |
Both is dramatically reduced by this PR compared to "old" prune. But the best value for |
|
Hey, I was quietly following this PR and been using it for a good while without any major issues. Glad to see 0.12 was released recently and contains this PR. In a nice coincidence, I also got a notification that my employer's opensource program has accepted some of the people working on it for the Open Source Peer Bonus Program. You should have gotten an e-mail about this - it is real and not a fraud. :) You can read more about the program it here. |
|
@vrusinov Thanks a lot for nominating me - I'm feeling very honored! I just received the bonus. I will donate a part of it to the Free Software Foundation Europe - and buy some nice gifts for my wife and children 😉 |
What is the purpose of this change? What does it change?
Reimplement
prunesuch that the main problems are solved:pruneis slow, especially for remote repositoriespruneuses too much memorypruneis not configurableFeatures are:
Output looks like this:
Moreover
forgetis enhanced:prune, of courseprunesimulation withrestic forget --dry-run --pruneThis PR has the following prerequisites:
(#2839 is not a must any longer; #2842 is now independent; #2843 is included here)
Was the change discussed in an issue or in the forum before?
see #2547
Prune issues have been widely discussed, e.g. #1140 #1599 #1723 #1985 #2162 #2227 #2305
There are also other PR trying to improve the situation, see #1994 #2340 #2507.
closes #1140
closes #1723
closes #1985
closes #2112
closes #2227
closes #2305
Checklist
changelog/unreleased/that describes the changes for our users (template here)gofmton the code in all commits