Improve reliability of large restores by MichaelEischer · Pull Request #4626 · restic/restic

MichaelEischer · 2024-01-07T13:12:21Z

What does this PR change? What problem does it solve?

Restoring large files can fail in some cases, for example https://forum.restic.net/t/errors-restoring-with-restic-on-windows-server-s3/6943 . This appears to be caused by blobs that are referenced a large number of times. When streaming a pack file, it takes a long time to write all instances of that blob to disk which can cause the network connection to be closed in the meantime. If writing the blob took longer than 15 minutes, then currently no retries are performed, which causes an incomplete restore.

#4624 has somewhat improved the situation by ensuring that each blob in a pack file is only processed once if streamPack has to retry the download. This helps if the blob processing delay is short enough that retries still happen, but won't fix other cases.

This PR addresses the problem from a different angle. It modifies the filerestorer such that blobs that are frequently referenced are downloaded individually and thereby avoids the timeout problem.

Was the change previously discussed in an issue or on the forum?

Related to #4605
Fixes https://forum.restic.net/t/errors-restoring-with-restic-on-windows-server-s3/6943

Checklist

I have read the contribution guidelines.
I have enabled maintainer edits.
I have added tests for all code changes.
~~[ ] I have added documentation for relevant changes (in the manual).~~
There's a new file in changelog/unreleased/ that describes the changes for our users (see template).
I have run gofmt on the code in all commits.
All commit messages are formatted in the same style as the other commits in the repo.
I'm done! This pull request is ready for review.

Writing these blobs to their files can take a long time and consequently cause the backend connection to time out. Avoid that by retrieving these blobs separately.

MichaelEischer

LGTM.

JsBergbau · 2024-01-15T14:27:10Z

It modifies the filerestorer such that blobs that are frequently referenced are downloaded individually and thereby avoids the timeout problem.

Let's say there is a blob referenced 1000 times and has 1 MB. Does this mean there is now 1000 MB downloaded, whereas before only 1 MB was downloaded?

MichaelEischer · 2024-01-15T18:29:06Z

@JsBergbau No. The amount of downloaded data hardly changes. The only difference is that frequently referenced blobs are now downloaded separately from other blobs, but still only once. For example, if restore needs three blobs blob1,blob2,blob3 from a pack file and blob2 is frequently referenced, then blob2 is downloaded separately from blob1,blob3. In a few cases this can mean that blob2 is downloaded twice, but that's about it.

MichaelEischer mentioned this pull request Jan 7, 2024

Improve reliability of restore #4627

Closed

5 tasks

MichaelEischer added 5 commits January 8, 2024 20:52

restore: split downloadPack into smaller methods

9328f34

restore: cleanup downloadPack

00d18b7

restore: split error reporting from downloadPack

2267910

restore: separately restore blobs that are frequently referenced

e78be75

Writing these blobs to their files can take a long time and consequently cause the backend connection to time out. Avoid that by retrieving these blobs separately.

add changelog for reliable restores

4ea3796

MichaelEischer force-pushed the reliable-large-restores branch from a1a2ba1 to 4ea3796 Compare January 8, 2024 20:03

MichaelEischer commented Jan 8, 2024

View reviewed changes

This comment was marked as duplicate.

Sign in to view

MichaelEischer merged commit c31e941 into restic:master Jan 9, 2024

MichaelEischer deleted the reliable-large-restores branch January 9, 2024 17:23

Porkepix mentioned this pull request Jan 14, 2024

restic 0.16.3 Homebrew/homebrew-core#159935

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve reliability of large restores#4626

Improve reliability of large restores#4626
MichaelEischer merged 5 commits intorestic:masterfrom
MichaelEischer:reliable-large-restores

MichaelEischer commented Jan 7, 2024 •

edited

Loading

Uh oh!

MichaelEischer left a comment

Uh oh!

This comment was marked as duplicate.

Uh oh!

JsBergbau commented Jan 15, 2024

Uh oh!

MichaelEischer commented Jan 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MichaelEischer commented Jan 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR change? What problem does it solve?

Was the change previously discussed in an issue or on the forum?

Checklist

Uh oh!

MichaelEischer left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as duplicate.

Uh oh!

JsBergbau commented Jan 15, 2024

Uh oh!

MichaelEischer commented Jan 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MichaelEischer commented Jan 7, 2024 •

edited

Loading