Conversation
This function returns the configured number of connections for the backed. The local backend uses a hard-coded connection count of 2, and the mem backend uses runtime.GOMAXPROCS(0).
Previously, restic would build a new index for the repository at the beginning of the prune, do the prune, and then build another new index at the end. Building these indexes could take a long time for large repositories, especially if they are using cloud storage. Restic now loads the existing repository index, keeps track of the added an removed packs, and writes a new index without having to rebuild it from scratch. It also parallelizes as many operations as it can. There is a new --ignore-index option to the prune command which makes restic ignore the existing index and scan the repository to build a new index. This option is not available for the forget command with the --prune option; restic will always load the existing index when run in that manner.
Codecov Report
@@ Coverage Diff @@
## master #2340 +/- ##
==========================================
- Coverage 51.09% 47.65% -3.45%
==========================================
Files 178 178
Lines 14546 14922 +376
==========================================
- Hits 7433 7111 -322
- Misses 6042 6791 +749
+ Partials 1071 1020 -51
Continue to review full report at Codecov.
|
|
Hello, With the master branch or restic, the prune operation on a local test repo takes about 4 days to complete, so I was looking for ways to improve the speed. I tried your branch on a local repo but unfortunately I get the following error: |
|
Hi @thiell I think this is unrelated to this change. See https://forum.restic.net/t/fatal-number-of-used-blobs-is-larger-than-number-available-blobs/1143 for this. Or in general a search in the forum. |
|
Ah, thanks @moritzdietz! I'll have a look. |
|
Thank you very much for proposing this PR! I think an improvement of prune is very important and we need to take into account as many ideas as possible! Maybe your good ideas should be separated in different PRs? Adding parallel operations is of course a good thing but IMO really hard to review/debug and I don't know if the core developers have enough time for this issue ATM 😏 |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2340 +/- ##
==========================================
- Coverage 51.09% 47.65% -3.45%
==========================================
Files 178 178
Lines 14546 14922 +376
==========================================
- Hits 7433 7111 -322
- Misses 6042 6791 +749
+ Partials 1071 1020 -51 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What is the purpose of this change? What does it change?
This makes pruning large repositories (especially when stored on a remote backend) much faster. It will use the existing index if it can (instead of building a new one from scratch), and it keeps track of changes to the repository to save a new index at the end. It also parallelizes slow operations, including scanning snapshots for used blobs, rewriting partially used packs, and deleting unused packs. As a side effect of the index related changes, it also handles missing index files.
Was the change discussed in an issue or in the forum before?
closes #2162
closes #2227
Checklist
changelog/unreleased/that describes the changes for our users (template here)gofmton the code in all commits