Skip to content

Prune issues: add new commands 'cleanup-index', 'cleanup-packs' and 'repack-index'#2513

Closed
aawsome wants to merge 9 commits intorestic:masterfrom
aawsome:new-cleanup-commands
Closed

Prune issues: add new commands 'cleanup-index', 'cleanup-packs' and 'repack-index'#2513
aawsome wants to merge 9 commits intorestic:masterfrom
aawsome:new-cleanup-commands

Conversation

@aawsome
Copy link
Copy Markdown
Contributor

@aawsome aawsome commented Dec 13, 2019

What is the purpose of this change? What does it change?

There are many issues with the prune command and it does a lot of things:

  • rebuild the index from pack pack files (is this pruning?)
  • remove blobs that are not used from the index
  • repack packs that contain unused blobs (and to do so again rebuild the index)
  • remove packs that are not used

In this PR three new commands are added:
cleanup-index removes all blobs not used in snapshots from index
cleanup-packs removes all packs that are not referenced by the index
repack-index repacks the index to get rid of small index files

With these three commands prune functionality can be done for usual repository state (i.e. non-broken repo).

All three commands are supposed to be fast and not more memory-consuming than 'backup' or 'check'.
Maybe in future a rewrite of 'prune' can use these commands. They just use the index implementation from either internal/repository or internal/index and only read index and metadata from the repositories (which should be already in the cache).

The new command can mitigate the situation in meanwhile and allow to clean up non-pruneable repositories, especially for large remote repositories.

Was the change discussed in an issue or in the forum before?

Prune issues have been widely discussed, e.g. #1140 #1599 #1723 #1985 #2162 #2227 #2305
There are also other PR trying to improve the situation, see #1994 #2340 #2507.

Maybe this pull request can be merged pretty fast as there is no change to existing functionality.
I'm looking forward to getting feedback from code-reviewers 😄

closes #1599
closes #1985
closes #2162
closes #2227
closes #2305

Checklist

  • I have read the Contribution Guidelines
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

Add new commands:
'cleanup-index' removes all blobs not used in snapshots from index
'cleanup-packs' removes all packs that are not referenced by the index
Changelog was added
@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Dec 15, 2019

I do not know why the build on macOS failed after adding the changlog (commit 7fcd225)...
Can anybody help?

Alexander Weiss added 2 commits December 22, 2019 12:08
- optimize cleanup-index
- add repacking of packs to cleanup-packs (WIP!)
- cleanup-packs can now be used to repack packs
- cleanup-index also checks for used blobs not in index
@irasnyd
Copy link
Copy Markdown

irasnyd commented Jan 2, 2020

I can report that I used the master restic + this PR built on December 19th, 2019 to prune a very large (12M object / ~55TB) AWS S3 backed repository. It was very useful, and worked perfectly as far as I can tell.

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Jan 2, 2020

@irasnyd Thank you for your feedback. I'm very pleased that this PR could help you!

I guess you used the version where only completely unused packs are deleted?
In cleanup-packs I added a option to allow repacking of packs that are partly used or small. This could save some more space, but is slow as it needs to download all relevant packs and re-upload them again.

I just realized that the command is not really verbose - I'll change this so that you can use --dry-run to play around with the new parameters --unused-percent, --unused-space and --used-space.

@irasnyd
Copy link
Copy Markdown

irasnyd commented Jan 2, 2020

@irasnyd Thank you for your feedback. I'm very pleased that this PR could help you!

I guess you used the version where only completely unused packs are deleted?

Yes, that is correct. I merged this PR up to commit 7fcd225.

In cleanup-packs I added a option to allow repacking of packs that are partly used or small. This could save some more space, but is slow as it needs to download all relevant packs and re-upload them again.

I just realized that the command is not really verbose - I'll change this so that you can use --dry-run to play around with the new parameters --unused-percent, --unused-space and --used-space.

Thanks for the additions. I think they are valuable improvements to this new functionality.

I won't be able to test them anytime very soon. I lifecycle my data into the AWS S3 Infrequent Access tier (a "colder storage" tier) to save costs, and I don't want to risk increased charges by repacking. It isn't worth it to me at the current time.

@seqizz
Copy link
Copy Markdown

seqizz commented Jan 27, 2020

Tried these commands and seems to act pretty good for a 500Gb repository (5min for index cleanup, 35min for packs - although the default verbosity was a bit much on packs). Good job 👍

Added the command `repack-index`. With this command the index files are
repacked so that small index files are put together into larger ones.
@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Feb 5, 2020

I've added a new command repack-index. With this command small index files can be rebuild together resulting in larger index files.

With the commands cleanup-index, cleanup-packs and repack-index all prune functionalities in terms of deleting unsued things and repacking small resulting files are now present. These new commands use less memory and are faster than the actual prune implementation.

Only recovery actions like rebuilding the index from pack files are not covered by the commands in this PR. Also a next step could be to put the parts together to a single new prune or forget functionality.

@aawsome aawsome changed the title Prune issues: add new commands 'cleanup-index' and 'cleanup-packs' Prune issues: add new commands 'cleanup-index', 'cleanup-packs' and 'repack-index' Feb 5, 2020
@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Feb 19, 2020

@tscs37 about your comment (in #2473)

Keep in mind that Prune took almost 17 days to complete for me (>400 hours) so unless the optimizations in the linked issue can improve on prune runtimes by two orders of magnitudes, it's not going to solve the underlying issue for the amount of data I back up

I would like to see the timing results of using this PR, if you can try them out. I do assume you get a HUGE improvement (and also reduced B2 costs) as standard prune downloads and scan your whole repository while these commands mainly work only on the index.

@tscs37
Copy link
Copy Markdown

tscs37 commented Feb 20, 2020

Both cleanup commands were indeed a lot faster and ran over about 8 hours, however the repack-index command did crash fairly early into processing, so I'm currently rebuilding the index (restic check on a subset of the backup gave a green light) and can't say anything about it's performance conclusively (though it ran fairly fast atleast as far as it got).

Stacktrace:


repository 2dfcbd9a opened successfully, password is correct                              
load all index files                                                                      
[0:22] 73.26%  400 / 546 index files loaded                                               
pack 94487184 already present in the index                                                
github.com/restic/restic/internal/index.(*Index).AddPack                                  
        internal/index/index.go:262                                                       
github.com/restic/restic/internal/index.Load.func1                                        
        internal/index/index.go:229                                                       
github.com/restic/restic/internal/repository.(*Repository).List.func1                     
        internal/repository/repository.go:643                                             
github.com/restic/restic/internal/backend.(*RetryBackend).List.func1.1                    
        internal/backend/backend_retry.go:133                                             
github.com/restic/restic/internal/backend/b2.(*b2Backend).List                            
        internal/backend/b2/b2.go:289                                                     
github.com/restic/restic/internal/backend.(*RetryBackend).List.func1                      
        internal/backend/backend_retry.go:127                                             
github.com/cenkalti/backoff.RetryNotify                                                   
        vendor/github.com/cenkalti/backoff/retry.go:37                                    
github.com/restic/restic/internal/backend.(*RetryBackend).retry                           
        internal/backend/backend_retry.go:36
github.com/restic/restic/internal/backend.(*RetryBackend).List
        internal/backend/backend_retry.go:126 
github.com/restic/restic/internal/repository.(*Repository).List
        internal/repository/repository.go:637 
github.com/restic/restic/internal/index.Load
        internal/index/index.go:201
main.RepackIndex
        cmd/restic/cmd_repack_index.go:68
main.runRepackIndex
        cmd/restic/cmd_repack_index.go:48
main.glob..func23
        cmd/restic/cmd_repack_index.go:18
github.com/spf13/cobra.(*Command).execute
        vendor/github.com/spf13/cobra/command.go:762
github.com/spf13/cobra.(*Command).ExecuteC
        vendor/github.com/spf13/cobra/command.go:852
github.com/spf13/cobra.(*Command).Execute
        vendor/github.com/spf13/cobra/command.go:800
main.main
        cmd/restic/main.go:86
runtime.main
        /usr/lib/go/src/runtime/proc.go:203
runtime.goexit
        /usr/lib/go/src/runtime/asm_amd64.s:1357

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Feb 20, 2020

@tscs37 Thank you for testing and reporting the error.

repack-index failed during reading the index and did not change anything. cleanup-index and cleanup-packs both leavs the repository in a sane state so if you only use the two commands, you can perfectly clean-up your repository; it just may happen that a lot of index files accumulate over time (wichdoes not affect functionality but may lead to performance issues)

I'll look after this issue in repack-index...

- Add handling to AddPack when packs are present in more than one
  index file
@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Feb 23, 2020

@tscs37 The issue you reported should now be fixed with the last commit.

- update Packs after merging
@tscs37
Copy link
Copy Markdown

tscs37 commented Feb 23, 2020

Thanks, after rebuilding the index and a check, it seems everything is running fine with all three commands, they do seem to cleanup quite a bit of data, though from the looks of it, a proper prune can still reach a tiny bit more data overall.

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Feb 23, 2020

@tscs37: by default cleanup-packs only removes packs that are completely unused. This can be done without reading any pack from the repo and hence is very fast. I have good experiences with my repos and the "overhead" (packs that are partly used) is in my case less than 1% of the repo size.

If you want to also repack partly used packs (as prune does), you can use the optional flags --unused-percent, --unused-size, --used-size.
Using --unused-size=1 will basically repack all partly used packs. This however means that the packs-to-repack have all to be read from the repo which can be quite slow for remote repos. (But is still faster than prune which reads all packs)

@mathiasnagler
Copy link
Copy Markdown

I ran all three new commands on my 29TB repository (with the latest changes) and it worked just fine. Running a restic check afterwards also shows no errors and the repository can be mounted just fine.
Thanks a lot for all your effort!

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Feb 28, 2020

With the last commit I made the commands work more parallel, changed the verbosity (as mentioned by @seqizz) and added the possibility to fine-tune repacking by specifying separate parameter for tree and data blobs. As tree blobs are usually cached and are more performance-critical if spread over too many small packs, I added some defaults to do some repacking there.

If anyone wants to test out the changes, I'm happy to get feedback!
Also feedback is appreciated where performance still seems to be an issue (e.g. @tscs37: Do you still get 8h runtime and which of the three commands is still taking a long time?)

@rawtaz
Copy link
Copy Markdown
Contributor

rawtaz commented Feb 28, 2020

Why does this have to be three commands? Would you want to run any of them separately without running the others?

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Feb 28, 2020

Why does this have to be three commands? Would you want to run any of them separately without running the others?

cleanup-index makes the index smaller and thus helps if you have performance problems or issues with memory requirement.

cleanup-packs only removes and repacks packs and therefore just helps saving space in your repository. If you do not clean up packs it does not affect performance of restic at all. (EDIT: Ok, a little bit because List of Pack files might be a bit slower...)

repack-index repacks many index files to larger ones, tackling performance issues with index (occuring mainly with the actual index implementation) or load latency with remote repos. This is actually similar to cleanup-index. I separated this command from cleanup-index as it uses another index implemention (from internal/index). So it is actually more a technical separation. Combining these two would be possible but then we should be careful to not have huge memory consumption by accidently holding the index twice in memory. However I was thinking about adding the "repack" functionality to cleanup-index once there is a new index implementation.

I can see cases where you would like to regularly clean up and repack your index files because of performance and memory issues but clean up packs only from time to time (and maybe using different repack parameter for, let's say, monthly and yearly runs) if you do not care too much about space in your repository.

EDIT: I'm open to suggestion how to best integrate these functionalities in forget or prune. Maybe the cleanup-index and repack-index functionality should be (optionally?) added to forget (which then removes snapshots and index entries related to deleted snapshots) and cleanup-packs should stay independent?

@seqizz
Copy link
Copy Markdown

seqizz commented Mar 1, 2020

Hi again,
Not sure if that's related with these changes, but wanted to try this branch on a ~3TB repository and got some weirdness:

  • Ran cleanup-index and repack-index, both finished pretty fast
  • Waited a day, within the day apparently the client cron tried to take a backup and OOM killed due to memory pressure while doing it (backup runs with non-patched release binary)
  • Ran cleanup-packs, took ~1h50m (prune takes 2h10m, so expected) without issues. Seems also some non-completed backup's data is even cleared:
repository 6e71ba60 opened successfully, password is correct
load indexes
find packs in index and calculate used size...
repack and collect packs for deletion
found 2459 unused packs
deleting unused packs will free about 12.112 GiB
found 0 partly used packs + 5 small packs = total 0 packs for repacking with 74.607 KiB
repacking packs will free about 16777216.000 TiB
cleanup-packs will totally free about 12.112 GiB
repacking packs...
updating index files...
check 189 index files and change if neccessary
index 982d54e1 was removed. new index: cde541c5
index d158a057 was removed. new index: 58b3f8a5
index b40412ee was removed. new index: bee057fd
index cff16b29 was removed. new index: b2a185e7
index aeffe534 was removed. new index: 18e0b20d
[0:12] 100.00%  189 / 189 index files processed
done
deleting packs (unused and repacked)
aeaff6cf6868c092706904a3fc13aae71e9c985a03aed308d36bb41499e752b5 was removed.
5e1f428340bb5dcc6d0c33d982c1e9221db7cd7a816441129030e8b7dbdfb115 was removed.
...

afterwards wanted to re-run first duo:

  • repack-index first:
repository 6e71ba60 opened successfully, password is correct
load all index files
[0:21] 100.00%  189 / 189 index files loaded
saving index
remove 189 old index files
[0:02] 100.00%  189 / 189 files deleted

(?removed all?)

  • Afterwards cleanup-index, failed with:
repository 6e71ba60 opened successfully, password is correct
get all snapshots
load indexes
find data that is still in use for 4 snapshots
tree d02f5a044a9efb08b76be2c823164f088098a59ca8d66eac03ddd0fb7afef75c not found in repository
github.com/restic/restic/internal/repository.(*Repository).LoadTree
        internal/repository/repository.go:712
github.com/restic/restic/internal/restic.FindUsedBlobs
        internal/restic/find.go:11
github.com/restic/restic/internal/restic.FindUsedBlobs
        internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
        internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
        internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
        internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
        internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
        internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
        internal/restic/find.go:31
main.getUsedBlobs
        cmd/restic/cmd_cleanup_index.go:216
main.runCleanupIndex
        cmd/restic/cmd_cleanup_index.go:67
main.glob..func7
        cmd/restic/cmd_cleanup_index.go:23
github.com/spf13/cobra.(*Command).execute
        vendor/github.com/spf13/cobra/command.go:762
github.com/spf13/cobra.(*Command).ExecuteC
        vendor/github.com/spf13/cobra/command.go:852
github.com/spf13/cobra.(*Command).Execute
        vendor/github.com/spf13/cobra/command.go:800
main.main
        cmd/restic/main.go:86
runtime.main
        /nix/store/61zfj9fyd2w555gx36mj3rgq1na6ss5k-go-1.12.16/share/go/src/runtime/proc.go:200
runtime.goexit
        /nix/store/61zfj9fyd2w555gx36mj3rgq1na6ss5k-go-1.12.16/share/go/src/runtime/asm_amd64.s:1337

I can still list the snapshots, tonight another backup will run and tomorrow planning to run prune, let's see how it goes.

@seqizz
Copy link
Copy Markdown

seqizz commented Mar 2, 2020

Welp, looks like repo corruption:

  • Backup couldn't finish after hours (normally takes <20 mins), manually cancelled
  • Normal prune also failed (tree not found)
  • check result:
    check_result.txt

I'll stop littering the PR with comments, since I am still not sure what specifically caused this (and I can't match the removed parts from these commands to check output). But ping if you'd like to get more info / have any ideas. I'll be running a rebuild-index on this repository 🤞

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Mar 2, 2020

@seqizz: Sorry to hear this. Did you run check directly after the OOM? From the error message it only states that there are some trees referenced that can not be found in the repo. If this comes from the OOM it might be a bug in the backup command.
However, if it comes from cleanup-packs it is a bug in this PR which I did not see with my tests 😳

repacking packs will free about 16777216.000 TiB

This really puzzles me. It indicates, that used size of one or more packs as calculated by the index entries was larger than the actual pack size (number is negative, but as it's a uint, it shows this ridiculous large size). It may indicate that the repo was already broken before cleanup-packs..

Unfortunately, repack-index also won't help if prune did not help. The only remedy is to identify which snapshots are affected, remove them, and re-backup.

@seqizz
Copy link
Copy Markdown

seqizz commented Mar 3, 2020

Did you run check directly after the OOM?

Sadly no. That was the reason I couldn't distinguish what was the issue.

But now that I run rebuild-index on the repo, check reports everything is fine again. So my suspicion is probably backup operations are not that cancel-safe. That also concludes repack-index didn't harm the repo, it was caused by cancelled backups. Sorry for the confusion. 👍

Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation should be rather safe to use for valid repositories, but in it's current state I wouldn't recommend it for a damaged repository.

@aawsome Shouldn't cleanup-index reindex packs which are missing from the repository index? That would remove the need to run a rather slow rebuild-index run, when just a few packs are missing from the index.

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Apr 21, 2020

@MichaelEischer Thank you very much for you valuable comments and sorry for the late rely.
Unfortunately I do suffer a lot from the actual COVID-19 countermeasures which gives me almost no spare time to work on this...

Summarizing, I think I should work on the following topics:

  • combine the cleanup-* commands to a single one where only parts may be called depending on flags set by the user.
  • make it more failure resistent in case of damaged repos (this works of course only for very specific case of damages)
  • work on performance

I'll try to carve out some time for this but I cannot guarantee that this will be possible in the next weeks or maybe even months depending on the COVID-19 measurements of my government. So if there is anyone who is willing to move on with the work in the mean time, feel free to work on the implementation!

@aawsome aawsome marked this pull request as draft April 21, 2020 20:14
@MichaelEischer
Copy link
Copy Markdown
Member

No need to hurry, the prune command can wait if you're kept busy by life (and COVID-19 measures).

After merging the commands, the result will be closer to what prune is currently doing, so you might want to check whether you can recycle some code. Regarding the error handling in prune, you might also want to take a look at #2674 which tightens the current error checks in prune. (And maybe also #2685 which should improve handling of eventually consistent backends.)

Depending on whether you intend the cleanup commands to work for damaged repositories or just abort if an error is detected, it might be valuable to have all blobs of a packfile contained in the index (see #2547). If restic has the complete list of blobs contained in a pack file, then it can decide whether that pack file is still necessary or not, even for repositories with damaged data packs.

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented May 1, 2020

I added another PR, #2718 where only one command cleanup is added.
If #2718 looks more promising, I'll close this PR.

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented May 14, 2020

I'm closing this now as IMO #2718 is almost finished and mature enough to replace the actual prune command.

@aawsome aawsome closed this May 14, 2020
@aawsome aawsome deleted the new-cleanup-commands branch November 7, 2020 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

7 participants