Skip to content

Speed-up copy command#3513

Merged
fd0 merged 1 commit intorestic:masterfrom
MichaelEischer:fast-copy
Mar 28, 2022
Merged

Speed-up copy command#3513
fd0 merged 1 commit intorestic:masterfrom
MichaelEischer:fast-copy

Conversation

@MichaelEischer
Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer commented Sep 15, 2021

What does this PR change? What problem does it solve?

The copy command currently proceeds blob by blob which can be very slow if there is any latency for backend accesses.

This PR reuses the repack operation used by prune to implement copy:
The repack operation copies all selected blobs from a set of pack files into new pack files. For prune the source and destination repositories are identical. To implement copy, just use a different source and destination repository.

This way the copy command gains all performance improvements made to the prune command, while also simplifying the implementation. The main change of this PR is the last commit, all other commits are part of #3484. Although this PR could be implemented standalone, I've decided to use #3484 as base as it only accesses the relevant parts of pack files instead of always downloading the full pack file.

The PR also adds a progress bar for the number of pack files copied for the current snapshot.

Was the change previously discussed in an issue or on the forum?

Fixes #2923.

Checklist

  • I have read the contribution guidelines.
  • I have enabled maintainer edits.
  • I have added tests for all code changes.
  • [ ] I have added documentation for relevant changes (in the manual).
  • There's a new file in changelog/unreleased/ that describes the changes for our users (see template).
  • I have run gofmt on the code in all commits.
  • All commit messages are formatted in the same style as the other commits in the repo.
  • I'm done! This pull request is ready for review.

@MichaelEischer MichaelEischer force-pushed the fast-copy branch 3 times, most recently from abc5ab3 to a031814 Compare September 22, 2021 20:09
DarkKirb added a commit to DarkKirb/restic that referenced this pull request Dec 29, 2021
Currently restic copy will copy each blob from every snapshot serially,
which has performance implications on high-latency backends such as b2.

This commit introduces 8x parallelism for blob downloads/uploads which
can improve restic copy operations up to 8x for repositories with many
small blobs on b2.

This commit also addresses the TODO comment in the copyTree function.

Related work:

A more thorough improvement of the restic copy performance can be found
in PR restic#3513
DarkKirb added a commit to DarkKirb/restic that referenced this pull request Dec 29, 2021
Currently restic copy will copy each blob from every snapshot serially,
which has performance implications on high-latency backends such as b2.

This commit introduces 8x parallelism for blob downloads/uploads which
can improve restic copy operations up to 8x for repositories with many
small blobs on b2.

This commit also addresses the TODO comment in the copyTree function.

Related work:

A more thorough improvement of the restic copy performance can be found
in PR restic#3513
MichaelEischer pushed a commit to greatroar/restic that referenced this pull request Dec 30, 2021
Currently restic copy will copy each blob from every snapshot serially,
which has performance implications on high-latency backends such as b2.

This commit introduces 8x parallelism for blob downloads/uploads which
can improve restic copy operations up to 8x for repositories with many
small blobs on b2.

This commit also addresses the TODO comment in the copyTree function.

Related work:

A more thorough improvement of the restic copy performance can be found
in PR restic#3513
@MichaelEischer MichaelEischer mentioned this pull request Mar 6, 2022
14 tasks
The repack operation copies all selected blobs from a set of pack files
into new pack files. For prune the source and destination repositories
are identical. To implement copy, just use a different source and
destination repository.
@fd0
Copy link
Copy Markdown
Member

fd0 commented Mar 26, 2022

I've taken the liberty of rebasing the branch on master after #3484 was merged.

Copy link
Copy Markdown
Member

@fd0 fd0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a very elegant solution, I like it a lot!

@fd0 fd0 merged commit a08b95c into restic:master Mar 28, 2022
@MichaelEischer MichaelEischer deleted the fast-copy branch March 28, 2022 19:30
mfrischknecht pushed a commit to mfrischknecht/restic that referenced this pull request Jun 14, 2022
Currently restic copy will copy each blob from every snapshot serially,
which has performance implications on high-latency backends such as b2.

This commit introduces 8x parallelism for blob downloads/uploads which
can improve restic copy operations up to 8x for repositories with many
small blobs on b2.

This commit also addresses the TODO comment in the copyTree function.

Related work:

A more thorough improvement of the restic copy performance can be found
in PR restic#3513
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Better feedback when using restic copy

2 participants