-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Discussion: Future of prune and rebuild-index #2547
Description
I would like to discuss how to improve the actual situation of prune (and rebuild-index which is used within prune) as there are quite some issues about it.
So far I see the following issues:
- It uses too much memory (restic prune: out of memory #1723). This is because of general memory problems with the index in
internal/repository, see reduce index memory usage #1988 but also becauseinternal/indexis used additional to the first index structure. This is the reason most users start complaining about memory consumption when usingprune. - It is too slow, see Small changes, long prune time #1599, prune on b2: restic stopped reading files for new index after 36k files #2024, handle large prune much more efficent #2162. This is mainly because it reads all pack files (twice?). This completely makes 'prune' unusable for remote repositories with low bandwidth. Also not parallelized (Parallelize prune pack rewrites #1470).
- It is not customizable. Repacking is slow/expensive but cannot be turned of or customized (Add adjustable utilization threshold for re-writing packs during prune #1985, prune --no-repack option #2305)
- It does too many things and is too complicated (Prune should either disregard repository indexes, or not rebuild indexes twice #2227). It first rebuilds the index from pack files, then walks all snapshots and all dirs within to get the used blobs, then finds the packs where to delete these blobs and rewrites all those packs. Finally the index is again re-created completely from the resulting pack files. Hence it not only removes things, but reorganizes the repository and is even able to recover from some broken repo issues. However this is one source for the performance problems.
There are already some proposals to fix specific issues, see #1994, #2340, #2507.
I also started a rebuild of prune functionalities in #2513. Here I only used the index from internal/repository to clean unused blobs from the index, remove unused blobs and optionally repack if requested by the user. This solves IMO Points 1. to 3. and has already been successfully used by me and others for production repostitories.
About 1.: Using #2513 and the new commands, internal/index is no longer used and could also be removed if prune is substituted by these commands.
About 4.: There is so far a command missing to "repack" the index files if wanted (to get rid of small index files) and there is no way implemented to recover the index from the pack files. (i.e. the functionality of rebuild-index)
I can propose to work on these issues in #2513, if wanted.
So my question is: Where should prune and rebuild-index go to?
Is it a way to complete #2513 and then use all of this functionality in a new prune command?
It would be great to get some direction here!