Skip to content

Improve reliability of large restores#4626

Merged
MichaelEischer merged 5 commits intorestic:masterfrom
MichaelEischer:reliable-large-restores
Jan 9, 2024
Merged

Improve reliability of large restores#4626
MichaelEischer merged 5 commits intorestic:masterfrom
MichaelEischer:reliable-large-restores

Conversation

@MichaelEischer
Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer commented Jan 7, 2024

What does this PR change? What problem does it solve?

Restoring large files can fail in some cases, for example https://forum.restic.net/t/errors-restoring-with-restic-on-windows-server-s3/6943 . This appears to be caused by blobs that are referenced a large number of times. When streaming a pack file, it takes a long time to write all instances of that blob to disk which can cause the network connection to be closed in the meantime. If writing the blob took longer than 15 minutes, then currently no retries are performed, which causes an incomplete restore.

#4624 has somewhat improved the situation by ensuring that each blob in a pack file is only processed once if streamPack has to retry the download. This helps if the blob processing delay is short enough that retries still happen, but won't fix other cases.

This PR addresses the problem from a different angle. It modifies the filerestorer such that blobs that are frequently referenced are downloaded individually and thereby avoids the timeout problem.

Was the change previously discussed in an issue or on the forum?

Related to #4605
Fixes https://forum.restic.net/t/errors-restoring-with-restic-on-windows-server-s3/6943

Checklist

  • I have read the contribution guidelines.
  • I have enabled maintainer edits.
  • I have added tests for all code changes.
  • [ ] I have added documentation for relevant changes (in the manual).
  • There's a new file in changelog/unreleased/ that describes the changes for our users (see template).
  • I have run gofmt on the code in all commits.
  • All commit messages are formatted in the same style as the other commits in the repo.
  • I'm done! This pull request is ready for review.

@MichaelEischer MichaelEischer force-pushed the reliable-large-restores branch from a1a2ba1 to 4ea3796 Compare January 8, 2024 20:03
Copy link
Copy Markdown
Member Author

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

MichaelEischer

This comment was marked as duplicate.

@MichaelEischer MichaelEischer merged commit c31e941 into restic:master Jan 9, 2024
@MichaelEischer MichaelEischer deleted the reliable-large-restores branch January 9, 2024 17:23
@JsBergbau
Copy link
Copy Markdown
Contributor

It modifies the filerestorer such that blobs that are frequently referenced are downloaded individually and thereby avoids the timeout problem.

Let's say there is a blob referenced 1000 times and has 1 MB. Does this mean there is now 1000 MB downloaded, whereas before only 1 MB was downloaded?

@MichaelEischer
Copy link
Copy Markdown
Member Author

@JsBergbau No. The amount of downloaded data hardly changes. The only difference is that frequently referenced blobs are now downloaded separately from other blobs, but still only once. For example, if restore needs three blobs blob1,blob2,blob3 from a pack file and blob2 is frequently referenced, then blob2 is downloaded separately from blob1,blob3. In a few cases this can mean that blob2 is downloaded twice, but that's about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants