Improve reliability of restore

What should restic do differently? Which functionality do you think we should add?
----------------------------------------------------------------------------------

There are a few corner cases that currently can cause restore to fail. Judging from https://forum.restic.net/t/errors-restoring-with-restic-on-windows-server-s3/6943 and https://forum.restic.net/t/restic-restore-failing-on-large-data-from-s3-with-error-an-existing-connection-was-forcibly-closed-by-remote-host/7062 , an individual blob that takes a long time to process can cause the network connection used by StreamPack to be closed unexpectedly.

The simplest "fix" would be to modify `StreamPack` such that it just downloads the whole pack file first and only starts processing it afterwards. However, that would lead to memory usage problems when larger pack files are used. Thus, we have to resort to the following bunch of fixes:

- [x] https://github.com/restic/restic/pull/4624 already ensures that a retry in `StreamPack` does not reprocess already downloaded blobs, as that would just trigger the same problem again.
- [x] A comprehensive fix also requires implementing https://github.com/restic/restic/issues/4193 ~~and to give the retries more time than the currently used 15 minutes~~. The latter part is no longer relevant by changing `StreamPack` to only request a size-limited chunk of the pack file and fully download that immediately.
- [x] finally https://github.com/restic/restic/pull/4605 , changes `StreamPack` such that if streaming the whole pack file fails, then it falls back to individually retrieving each requested blob. With the previous list of changes that's like not necessary, but can be useful nevertheless.
- [x] https://github.com/restic/restic/pull/4784 . retries should be able to conceal a network connection that's interrupted for a few minutes. Ideally without endlessly delaying the shutdown of restic if the lock file cleanup fails.
- [x] https://github.com/restic/restic/pull/4626 mostly sidesteps the timeout problem by separately downloading frequently referenced blobs, which take a long time to write during restore. From a conceptual viewpoint this workaround has the problem that `StreamPack` fails to isolate its caller from the repository/backend implementation details.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve reliability of restore #4627

What should restic do differently? Which functionality do you think we should add?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve reliability of restore #4627

Description

What should restic do differently? Which functionality do you think we should add?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions