Conversation
9fcb08a to
d4f4494
Compare
37f89c9 to
dab11f0
Compare
|
Could we please discuss first how such a command should look like? How does this relate to the rewrite command in #2731 ? What if the prune command could handle certain types of repository damages? Which types of repair commands for a repository do we need and how to avoid cluttering the CLI? That said, with the current flood of new pull requests which is growing much faster than we can handle them and the still huge pile of older ones waiting for a review, it will quite likely take some time until anyone of the maintainers has time to participate in that discussion. This PR does not fix the underlying issue of #2334, it merely removes the symptoms (the broken file). But it's still not possible to restore at least the remaining parts of a file. |
|
I thought there would be some changes appearing in near future on either So, basically the point is that restic is missing such kind of functionality and IMO this is critical as cases has already been reported where users are struggling with broken repos which can only be fixed by forgetting and pruning whole snapshots. This is why I proposed this PR. About the name and where to put such functionality we can and should of course discuss and I'm fine to have the discussion here. I agree that the functionality is pretty similar to the I must admit that there is another reason why I used a new command
If |
It's definitely useful to have a WIP PR to point people to to repair damaged repositories.
From an implementation perspective both commands are pretty similar so it would be useful to make the recursive tree rewriting part reusable for both commands. (Assuming we end up with two separate commands)
With "handle certain types of repository damages" I had in mind that it should be possible to let the prune command tolerate (= be able to remove unused blobs/packs) repositories which e.g. miss a few data blobs. Of course, the prune command shouldn't start to modify snapshots.
I've missed the |
You are right - in general missing data blobs are not a that big issue when thinking about the sanity of the repository. Of course, data might be lost which might be a bad thing for users. But starting with a repository where you have missing data blobs, probably the best is actually to remove them from the index but let the trees still reference them. If another backup run will stumble over the missing data, the repository will be healed. And if not, all functionality is working perfectly except if you need to access that specific data, i.e. a file which contains a missing blob. (and even restore and mount could work around this and include the functionality I propose here for missing blobs) Another issue is missing tree blobs. This severely damages the repository as most commands like
I recently thought about a check when loading the index whether the pack files referenced in the index still exist in the backend. This would allow users to just remove damaged packs (missing packs would be already detected) and then an information in the index could be saved whether the blob is present or only the length saved in the index may be used. I rejected to implement this as a regular check as it would mean that every 'LoadIndex' would need to 'List' the pack files which could be expensive. Maybe that might be a direction to think of? |
I don't think we want to list all pack files every time the repository index is loaded, it really sounds (needlessly) expensive. It would also only be a partial workaround, as having to rebuild the repository index would still loose the information. And we'd also get into a lot of trouble when mixing older and newer restic version which have differing opinions on which blobs actually exist and which don't. The only way to properly fix this is to handle this in a repository format v2 #628 . |
|
For the second time I have a corrupted repo - too bad. Probably the reason was that the backup disk was full. However, when I run 'make test' I get other tests are o.k. Tested on ARCH 5.8.7-arch1-1 |
|
Hi @pddp
Actually, this PR doesn't change anything on |
|
Hi @aawsome your guess was correct - this was uncorrelated. Also happens on master. |
pddp
left a comment
There was a problem hiding this comment.
Just a minor thing: '--delete' vs '--delete-snapshots'.
Options should be mentioned in the --help text.
Other than that: works as expected.
dab11f0 to
0353cc8
Compare
|
My repo seems to be corrupted after I ran |
6f475f4 to
6c15a4e
Compare
The old name still works, but is deprecated.
Simplify CLI options: * Rename "DeleteSnapshots" to "Forget" * Replace "AddTag" and "Append" with hardcoded values Change output and snapshot modifications to be more in line with the "rewrite" command.
The files in a tree must be sorted in lexical order. However, this cannot be guaranteed when appending a filename suffix. For two files file, file.rep where "file" is broken, this would result in file.repaired, file.rep which is no longer sorted. In addition, adding a filename suffix is also prone to filename collisions which would require a rather complex search for a collision-free name in order to work reliably.
Now the rewritten tree is always serialized which makes sure that we don't accidentally miss any relevant changes.
A broken directory might also not have a subtree.
The more generic RewriteNode callback replaces the SelectByName and PrintExclude functions. The main part of this change is a preparation to allow using the TreeRewriter for the `repair snapshots` command.
This adds support for caching already rewritten trees, handling of load errors and disabling the check that the serialization doesn't lead to data loss.
The previous approach of rewriting all snapshots first, then flushing the repository data and finally removing old snapshots has the downside that an interrupted command execution leaves behind broken snapshots as not all new data is already flushed.
If the node for a file is intact, this is a no-op.
6c15a4e to
0c3f272
Compare
0c3f272 to
e71367e
Compare
|
LGTM. |
|
Just wanted to say that this PR saved me a lot of work. I had a corrupted repository, I couldn't figure out the issue, and every snapshot seemed to be corrupt missing at least one blob. By rebuilding restic from source to include this PR, I was able to fix the repository very easily, and have now recovered about 95% of the original files. Thank you so much @aawsome and all others involved! This is evidently a very valuable feature for restic, it is greatly appreciated. |
What does this PR change? What problem does it solve?
Allow users to recover from broken repositories/snapshots while still salvaging the sane parts of the repository/snapshot.
For given snapshots (selection identical to, e.g.,
forget) the command tries to read all trees and checks if the needed blobs are contained in the index. If blobs are missing or trees cannot be read, it will create new trees and snapshots which only miss these "defect" parts.Those newly generated snapshots can be used to recover needed data.
Also, after removing the "defect" snapshots,
pruneis able to clean up the repo again.While this command is able to cause data loss, special care is taken such that the default flags won't
do any harm - in fact, users have to explicitly specify
--dry-run=false --deleteto loose data.Output looks like:
Depends on #2878 for the troubleshooting docu update.
Was the change discussed in an issue or in the forum before?
Closes #1759
Closes #1798
Closes #2334
Checklist
changelog/unreleased/that describes the changes for our users (template here)gofmton the code in all commits