restic prune: selection of packs to repack based on size#5183
restic prune: selection of packs to repack based on size#5183MichaelEischer merged 4 commits intorestic:masterfrom
Conversation
MichaelEischer
left a comment
There was a problem hiding this comment.
The change in internal/repository/prune.go is more complex than it needs to be. It should be sufficient to just set the targetPackSize to the user specified value. The current approach feels like a duplication of the existing selection of small packs.
The PR also must be rebased onto the current master branch as we generally don't merge PRs that contain merge commits. (See the git rebase command / the git book).
I'd also like to have a small test for the new option, although I can also help with that. The test should probably use an approach similar to that of https://github.com/restic/restic/pull/5212/files .
|
PR has been rebased. integration test still outstanding. |
MichaelEischer
left a comment
There was a problem hiding this comment.
Thanks, I found a few more nits
|
I added logic to the case |
The plan for the test is as follows: |
The reason why I change the packsize is the simulation of going to larger packsizes. There might be other methods of forcing repacking of smaller packfiles, but that's what I did :) |
MichaelEischer
left a comment
There was a problem hiding this comment.
I found a few issues with the test, see below. We also have to discuss the method name, in the linked issue the suggestion was to use --repack-smaller-than.
if packSize >= int64(targetPackSize) && opts.SmallPackBytes == 0 && !opts.RepackSmall {
As mentioned in the comment, I have no clue what this is supposed to do.
internal/repository/prune.go
Outdated
| // only repack very small files by default | ||
| targetPackSize := repo.packSize() / 25 | ||
| if opts.RepackSmall { | ||
| if opts.SmallPackBytes > 0 && opts.RepackSmall { |
There was a problem hiding this comment.
Let's shorten this to if opts.SmallPackBytes > 0. That is, setting --small-pack-size implies --repack-small.
internal/repository/prune.go
Outdated
|
|
||
| case p.unusedBlobs == 0 && p.tpe != restic.InvalidBlob && !mustCompress: | ||
| if packSize >= int64(targetPackSize) { | ||
| if packSize >= int64(targetPackSize) && opts.SmallPackBytes == 0 && !opts.RepackSmall { |
There was a problem hiding this comment.
I don't have the slightest idea what problem this change is supposed to solve. Even the new test case works without this change. From what I can see this only breaks certain corner cases.
This changed condition causes all pack files to end up in the repackSmallCandidates list, making the if len(repackSmallCandidates) < 10 useless. When setting --repack-small --max-unused 0 this will repack the whole repository. In short, revert this change, it won't be merged.
There was a problem hiding this comment.
removed faulty if-clause.
internal/repository/repository.go
Outdated
| } | ||
|
|
||
| // SetPackSize - this is a hack! | ||
| func (r *Repository) SetPackSize(size uint) { |
There was a problem hiding this comment.
Remove that method again. I've spent a lot of time to ensure that the Repository struct exposes as little internals as possible. That stance won't change.
If absolutely necessary for repository tests, then the test code has to move to the repository package (compare repository_internal_test.go and repository_test.go). However, as commented in prune_test.go this isn't the case here.
internal/repository/prune_test.go
Outdated
| repo, _, be := repository.TestRepositoryWithVersion(t, 0) | ||
|
|
||
| // cheating! | ||
| repo.SetPackSize(uint(packSize)) |
There was a problem hiding this comment.
This hack is unnecessary. Just use a pack size of 4MB and the following constructor:
be := TestBackend(t)
repo, _ := repository.TestRepositoryWithBackend(t, be, 0, repository.Options{PackSize: repository.MinPackSize})
There was a problem hiding this comment.
Thanks for the tip!
internal/repository/prune_test.go
Outdated
| return nil | ||
| }, &progress.NoopPrinter{}) | ||
| rtest.OK(t, err) | ||
| _ = plan |
There was a problem hiding this comment.
This looks like a leftover.
There was a problem hiding this comment.
assignment is gone.
internal/repository/prune_test.go
Outdated
| //t.Log(fmt.Sprintf("stats.Size %+v", stats.Size)) | ||
| //t.Log(fmt.Sprintf("stats.Blobs %+v", stats.Blobs)) | ||
| //t.Log(fmt.Sprintf("stats.Packs %+v", stats.Packs)) | ||
| rtest.Equals(t, stats.Size.Used <= stats.Size.Repack, true, "all sizes ...") |
There was a problem hiding this comment.
When triggering an error, this currently results in the following error message:
prune_test.go:265:
all sizes ...
exp: false
got: true
I have no clue what that message tries to tell me. The call also looks like you wanted to use
rtest.Assert(t, stats.Size.Used <= stats.Size.Repack, "not all packs repacked, expected at least size %v got %v", stats.Size.Used, stats.Size.Repack)
There was a problem hiding this comment.
still working on the tests.
There was a problem hiding this comment.
tests are improved, The checks should be more easily to understand.
internal/repository/prune_test.go
Outdated
| //t.Log(fmt.Sprintf("stats.Blobs %+v", stats.Blobs)) | ||
| //t.Log(fmt.Sprintf("stats.Packs %+v", stats.Packs)) | ||
| rtest.Equals(t, stats.Size.Used <= stats.Size.Repack, true, "all sizes ...") | ||
| rtest.Equals(t, stats.Blobs.Used == stats.Blobs.Repack, true, "all blobs ...") |
There was a problem hiding this comment.
rtest.Equals(t, stats.Blobs.Used, stats.Blobs.Repack, "repacked wrong number of blobs")
Similar changes like this and the one above apply to the two rtest.Equals calls further below.
cmd/restic/cmd_prune.go
Outdated
| f.BoolVar(&pruneOptions.RepackCacheableOnly, "repack-cacheable-only", false, "only repack packs which are cacheable") | ||
| f.BoolVar(&pruneOptions.RepackSmall, "repack-small", false, "repack pack files below 80% of target pack size") | ||
| f.BoolVar(&pruneOptions.RepackUncompressed, "repack-uncompressed", false, "repack all uncompressed data") | ||
| f.StringVar(&pruneOptions.SmallPackSize, "small-pack-size", "", "pack `below-limit` packfiles smaller than (allowed suffixes: k/K, m/M, g/G, t/T)") |
There was a problem hiding this comment.
Please add a bit of documentation to doc/060_forget.rst (somewhere near --repack-cacheable-only)
cmd/restic/cmd_prune.go
Outdated
| f.BoolVar(&pruneOptions.RepackCacheableOnly, "repack-cacheable-only", false, "only repack packs which are cacheable") | ||
| f.BoolVar(&pruneOptions.RepackSmall, "repack-small", false, "repack pack files below 80% of target pack size") | ||
| f.BoolVar(&pruneOptions.RepackUncompressed, "repack-uncompressed", false, "repack all uncompressed data") | ||
| f.StringVar(&pruneOptions.SmallPackSize, "small-pack-size", "", "pack `below-limit` packfiles smaller than (allowed suffixes: k/K, m/M, g/G, t/T)") |
There was a problem hiding this comment.
I just noticed that the original issue suggested --repack-smaller-than as option. Is there a particular reason to deviate from that?
P.S. the description should be (pack -> repack, packfiles -> pack files)
"repack pack files smaller than `below-limit` (allowed suffixes: k/K, m/M, g/G, t/T)"
We could also drop the t/T suffix as it will always lead to an error. g/G currently is also always an error as the pack size is still limited to 128MB, but that could change in the future.
36122cb to
680b074
Compare
|
I've rebased the PR (and tried to fix the merge conflicts while rebasing) to remove the clutter from the "files changes" tab on Github and prepare for reviewing. |
|
Hi Michael,
Thank you for hard work on this PR. As I indicated earlier, I still don't
know much about git and github, and therefore make too many unforced
mistakes. I would appreciate a few hints, what to do, what to avoid.
Thanks again,
Winfried
…On Mon, 3 Feb 2025 at 21:32, Michael Eischer ***@***.***> wrote:
I've rebased the PR (and tried to fix the merge conflicts while rebasing)
to remove the clutter from the "files changes" tab on Github.
—
Reply to this email directly, view it on GitHub
<#5183 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEO7MGKEXVY6ILWUNHJDHDD2N7N77AVCNFSM6AAAAABTWK65N2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZSGEZDEOJSHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
No problem, it was basically "just" a matter of rebasing the PR. However, due to the merge conflicts that required a bit of experience with
|
|
Judging from the "rebase" commits in #5185 you're holding |
|
I understand now why my |
8998704 to
680b074
Compare
|
Please stop pushing merge commits into your PRs. |
|
I messed up git again, so I did a hard reset and activated all additions and changes. |
|
Will be reworking the PR based on commit 8c12291 |
…ether cmd_prune.go: added option `--repack-smaller-than` prune.go: added field `SmallPackBytes` to `PruneOptions`, including checking and processing prune_test.go: added test `TestPruneSmall` doc/060_forget.rst: added description of enhancement changelog/unreleased/issue-5109: description of enhancement
changed option from `--small-pack-size` to `--repack-smaller-than`
There was a problem hiding this comment.
LGTM.
I've changed the --repack-smaller-than option to work independently of the --repack-small option. My goal is to eventually get rid of --repack-small so let's not tie those options together.
[Edit]See #5293 for more context on that[/Edit]
cmd_prune: Selection of packs to repack based on size. Add option --small-pack-size so prune can repack all packfiles smaller than small-pack-size bytes.
What does this PR change? What problem does it solve?
Add option --small-pack-size so prune can repack all packfiles smaller than small-pack-size bytes.
The new option is passed onto internal/repository/prune.go.
A new decision has been added to "decide what to do" to
move the intended packfiles onto the repackSmallCandidates queue.
Was the change previously discussed in an issue or on the forum?
#5109
closes #5109
Checklist
changelog/unreleased/that describes the changes for our users (see template).