Add --device-map flag to allow remapping DeviceIDs#3599
Add --device-map flag to allow remapping DeviceIDs#3599intentionally-left-nil wants to merge 1 commit intorestic:masterfrom
Conversation
When backing up from a snapshot (ZFS or btrfs as an example), each snapshot will be mounted by the filesystem as a different DeviceID This causes restic to upload the directory structure for each new snapshot, even though the structure is identical. This commit adds a new flag to prevent this behavior and allow restic to re-use the previously uploaded directory structure. The concept is simple. When calling restic backup, pass the --device-map src:dest to change any node with a DeviceID of src to behave as if it were dest Luckily, nodeFromFileInfo is the only place the DeviceID is read in the backup codepath, so it's straightforward to shim the DeviceID here Closes restic#3041
ff784f7 to
45b811e
Compare
|
Why not let restic automatically assign pseudo-device ids as suggested in #3041 (comment) ? As rawtaz and I have commented (see e.g. #3041 (comment) ) The PR currently does not address device ID collisions at all, which can be a problem once users start fiddling with the device ids. |
|
Hi @MichaelEischer, thanks for taking a look. I'd be happy to implement an automatic mapping algorithm, but I'm not sure how to go about it. Consider a scenario with 4 files, spread across two devices: and let's say that for the first backup, device1 has a DeviceID of 17, and device2 has a DeviceID of 18. The question becomes: How do we map device1 and device2 to pseudo-ids which are consistent across runs? But the problem with this approach is it isn't stable across backups, unless enumeration happens in the same order. For example, on the next backup, let's say that instead of backing up a.txt, the enumeration finds /device1/b.txt first (or maybe you deleted a.txt). Now, the algorithm would see that there is no device mapping for ID 81, and so it would hash /device1/b.txt. But now this is a different file/inode so let's say that maps to pseudo id 11. And now we haven't solved anything because the pseudo-id's are not consistent across runs. Some ideas I considered, but with obvious flaws:
The last approach seems most reasonable to me but also fairly involved. I don't know enough about all the nuances of file systems (including network drives) to know what gotcha's I'd run into trying to traverse up parent directories. The benefit of the current approach is it's very iterative (code-wise) and results in a small diff with consistent, repeatable behavior. The last approach would involve a much larger diff and would be more likely to have a tail of "it's not stable in this configuration" etc. Happy to hack on ideas more, once we have an idea of how the algorithm should work. |
|
Thinking about this some more, it looks like |
|
Is this issue specific to Linux? I'm wondering how many systems have the issue but don't have |
|
I am using ZFS snapshots with restic and am affected by issue 3041. With the |
The backup sorts filenames before traversing a folder. Thus the traversal order is stable.
Hardlinks cannot point across devices. Or is there some subvolume related trickery that avoids that limitation? The archiver component already traverses the filesystem starting from the root directory |
Although hardlinks cannot be made across devices, bind mounts can be made at different points in the filesystem. The bind mounts all share the same device ID. With the traversal order being stable, bind mounts don't seem like an issue though. The mount that is seen first would be the path used for the pseudo identifier for the device. If this path is hashed in some way to produce the identifier it wouldn't matter how many other devices were seen before this device. |
|
Hey, I've been successfully using this patch in unattended backups, though, on a filesystem without any bind mounts in it. Thanks! |
|
As discussed in #3041 we want to have a solution that just works, without having users manually map device ids. That is the PR is just there as a temporary workaround, but is unlikely to get merged. |
I'm going to close this PR out. My personal opinion is that in the search of a perfect solution, we've sacrificed a good enough one. This issue has been open for 3 years now. It's frustrating that a low-risk, opt-in fix wasn't able to make it over the line (and a better one hasn't materialized either) |
|
There's now #4006 . |
What does this PR change? What problem does it solve?
When creating backups from ZFS or btrfs snapshots, each snapshot will have a different DeviceID. This causes restic to upload the directory structure for each backup, even if nothing has changed. To solve this, this PR adds the --device-map flag which will allow users to map different device id's to the same logical device id.
The idea is that users can figure out what device ID their snapshot currently points to, (e.g. with
stat -c '%d'and then re-map that to a virtual device ID that is consistent across backups.
For example, if there's a btrfs snapshot mounted at
/home/.snapshots/123/snapshot/, then you could do something like this:Was the change previously discussed in an issue or on the forum?
There is a decent amount of discussion on the linked issue. The larger issue here is that there are no guarantees anyways that the DeviceID is stable, and if it should be relied upon. One proposed idea was to implement an -ignore-deviceid flag, similar to the other ignore flags. However, given that the DeviceId is used to determine if symlinks span across different filesystems, this ended up not being a trivial change to implement.
Closes #3041
Checklist
changelog/unreleased/that describes the changes for our users (see template).gofmton the code in all commits.