Skip to content

Fix statistics in prune for duplicates#3113

Merged
fd0 merged 1 commit intorestic:masterfrom
aawsome:fix-prune-stats-duplicates
Nov 19, 2020
Merged

Fix statistics in prune for duplicates#3113
fd0 merged 1 commit intorestic:masterfrom
aawsome:fix-prune-stats-duplicates

Conversation

@aawsome
Copy link
Copy Markdown
Contributor

@aawsome aawsome commented Nov 18, 2020

What does this PR change? What problem does it solve?

Fixes statistics calculation in prune if there are duplicates present in the repo.

Output of restic prune -n -v without this bug fix when using the repo in cmd/restic/testdata/repo-duplicates.tar.gz:

used:                 4 blobs / 1.718 KiB
duplicates:           2 blobs / 148 B
unused:               4 blobs / 1.726 KiB
total:               10 blobs / 3.588 KiB
unused size: 48.09% of total size

to repack:            2 blobs / 220 B
this removes          2 blobs / 148 B
to delete:            4 blobs / 1.726 KiB
total prune:          6 blobs / 1.870 KiB
remaining:            4 blobs / 1.718 KiB
unused size after prune: 16777216.000 TiB (100.00% of remaining size)

Output with this bug fix:

used:                 5 blobs / 1.790 KiB
duplicates:           1 blobs / 74 B
unused:               4 blobs / 1.726 KiB
total:               10 blobs / 3.588 KiB
unused size: 50.11% of total size

to repack:            2 blobs / 220 B
this removes          1 blobs / 74 B
to delete:            4 blobs / 1.726 KiB
total prune:          5 blobs / 1.798 KiB
remaining:            5 blobs / 1.790 KiB
unused size after prune: 0 B (0.00% of remaining size)

Note that this fix only solves the statistics problem, if all duplicates are marked for repacking.
If not all duplicates are marked for repacking, we lack the information what duplicates still exist after pruning.

The situation that not all duplicates are marked for repacking can occur when using the max-repack-size option.

Was the change discussed in an issue or in the forum before?

closes #3110

Checklist

  • I have read the Contribution Guidelines
  • I have enabled maintainer edits for this PR
  • I have not added tests for all changes in this PR
  • I have not added documentation for the changes (in the manual)
  • There's no new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

Note that this fix only solves the statistics problem, if
all duplicates are marked for repacking.

If not all duplicates are marked for repacking, we lack the
information which

The situation that not all duplicates are marked for repacking can occur
when using the `max-repack-size` option
@fd0
Copy link
Copy Markdown
Member

fd0 commented Nov 19, 2020

Thanks for the fast fix!

I think it'd be a great idea to extract all the calculation logic into a separate function/struct, so we can add proper tests for it (not necessarily in this PR). What do you think?

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Nov 19, 2020

I think it'd be a great idea to extract all the calculation logic into a separate function/struct, so we can add proper tests for it (not necessarily in this PR). What do you think?

Fully agree! In fact there are two things we should test:

  • Assure that prune basically does the job, i.e. removes data and does not damage the repository. This is mainly about assuring that the repacking, deletion and index rebuilding is done correctly. But also to ensure that the algorithm doesn't choose a pack for deletion which is actually still needed.

These tests are IMHO pretty well implemented.

  • The main pruning algorithm which basically returns the list of packs to delete and to repack. As pointed out, here only the packs-to-delete part is mission critical, the decision which packs to repack is basically "only" performance-critical.

Here tests are missing...

@aawsome
Copy link
Copy Markdown
Contributor Author

aawsome commented Nov 19, 2020

@fd0: Forgot to mention: Yes I think we should postpone this testing issue to a follow-up PR.

Copy link
Copy Markdown
Member

@fd0 fd0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@fd0 fd0 merged commit 110a32a into restic:master Nov 19, 2020
@aawsome aawsome mentioned this pull request Nov 21, 2020
6 tasks
@aawsome aawsome deleted the fix-prune-stats-duplicates branch November 22, 2020 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prune stats calculation bug

2 participants