restic check with snapshot filters (continuation of PR #5213)#5469
Conversation
There was a problem hiding this comment.
The changes from #5532 require untangling how the pack file selection works here, see the comments for details. Structure(...) should also adhere to the snapshot selection. That method can also generate the pack file list, making CheckWithSnapshots unnecessary and also avoiding the massive performance penalty of iterating a second time over the snapshot trees.
Edit:
Just integrate the snapshot selection into checker.LoadSnapshots. Then checker.Snapshots naturally also adheres to the snapshot filter. The checker.Checker can the also wrap the ReadPacks method such that it replaces the pack list passed to the filter callback with the one only containing the packs relevant for the snapshots.
|
Hi MichaelEischer, looking at the existing changes after #5532 and the proposed changed in #5550 I have decided to wait until #5550 has been commited. Then I will start from a |
9ba1bf4 to
3d78e0b
Compare
|
I have no idea what this Windows test ``TestDirAttributeCombinationsOverwrite` does, I have seen it before, it seems to be transient, and I have no idea of how to fix it. I need some solid Windows advice. Thanks! |
|
I have removed these failing windows tests from the this PR. I will have to wait anyway until this issue has been resolved. |
MichaelEischer
left a comment
There was a problem hiding this comment.
Thanks for the changes so far. This is definitely moving into the right direction, but needs a few further cleanups. The current ReadPacks extension is also far slower than it has to be (see comments).
MichaelEischer
left a comment
There was a problem hiding this comment.
Thanks for keeping up with all the review comments! I have a few further comments, although most of them are nits this time.
| return summary, ctx.Err() | ||
| } | ||
|
|
||
| opts.filteredStatus = chkr.FilterStatus() |
There was a problem hiding this comment.
Let's move it below the if opts.CheckUnused { block. Then the call is directly next to the place where it's used.
There was a problem hiding this comment.
moved the call to buildPacksFilter() directly before its use.
| } | ||
|
|
||
| opts.filteredStatus = chkr.FilterStatus() | ||
| readDataFilter, err := buildPacksFilter(opts, printer) |
There was a problem hiding this comment.
filteredStatus should be passed as parameter to buildPacksFilter(...). It doesn't really belong into the opts struct.
| c.snapshots, err = restic.MemorizeList(ctx, c.repo, restic.SnapshotFile) | ||
| c.snapshotFilter = snapshotFilter | ||
| c.args = args | ||
| c.trackUnused = true |
There was a problem hiding this comment.
That should only be set if the snapshotFilter is actually used. Aka. c.filterActive must also be set here. Or even better, get completely rid of filterActive and just use a helper function that determines whether there's an active filter.
There was a problem hiding this comment.
moved setting of c.trackUnused into loadActiveTrees().
| } | ||
|
|
||
| // convert used blobs into their encompassing packfiles | ||
| selectedPacks := restic.NewIDSet() |
There was a problem hiding this comment.
This should also only be computed within filteredPackfiles. The same applies for the variable definition of packsWithSize.
There was a problem hiding this comment.
Your suggestion encapsulates the code better :)
| } | ||
|
|
||
| // FilterStatus returns the filtering status | ||
| func (c *Checker) FilterStatus() bool { |
There was a problem hiding this comment.
Should move right below LoadSnapshots().
| return | ||
| } | ||
| c.filterActive = true | ||
| c.trees = trees |
There was a problem hiding this comment.
Should just return the trees instead of storing them in c.trees
| errs = []error{} | ||
|
|
||
| if len(c.args) > 0 || !c.snapshotFilter.Empty() { | ||
| err := (&c.snapshotFilter).FindAll(ctx, c.snapshots, c.repo, c.args, func(_ string, sn *data.Snapshot, er error) error { |
There was a problem hiding this comment.
| err := (&c.snapshotFilter).FindAll(ctx, c.snapshots, c.repo, c.args, func(_ string, sn *data.Snapshot, er error) error { | |
| err := (&c.snapshotFilter).FindAll(ctx, c.snapshots, c.repo, c.args, func(_ string, sn *data.Snapshot, err error) error { |
There's no problem with shadowing the outer err variable
| return buf.String(), err | ||
| } | ||
|
|
||
| func TestCheckFullOutput(t *testing.T) { |
There was a problem hiding this comment.
Please merge the tree tests into a table test (see for example TestPrepareCheckCache; you bascially create a struct that contains the expected inputs and output and then use a loop to iterate over all of them). Almost all of their code is shared.
There was a problem hiding this comment.
bult testCase table.
MichaelEischer
left a comment
There was a problem hiding this comment.
LGTM. Thanks!
I've added a few small cleanups, see the extra commits for details. In particular, only packsWithSize is built but not selectedPacks. As sets in go are under the hood also maps, we can just directly build the final map. FilterStatus() is now the authoritative source whether filtering happens and is less sensitive to whether it's called after certain other methods.
New implementation based on PR#5550 (refactor pack selection for read data): cmd/restic/cmd_check.go: had to move the filter creation code buildPacksFilter() to the point where the repo is open and the index has been loaded. Otherwise the new function selectPacks() would not work. Added the new function selectPack() which creates a packfile list based on current snapshot filters. selectPack() creates a list of used snapshots, uses data.FindUsedBlobs() to create a set of used blobs and uses this set to generate a map of used packfiles. In buildPacksFilter() code has been inserted for the print output to indicate snapshot filter selection. doc/077_troubleshooting.rst adds a paragraph about snapshot filtering. changelog/unreleased/issue-3326 announces the new enhancement.
linter caught me out!
cmd/restic/cmd_check.go: moved filter checking into internal/checker/checker.go as a consequence, checker.LoadSnapshots() gained thwo paramenters, which were just saved away. The actual processing of the snapshot filter got moved into checker.Structure(), where context and repository are available. function buildPacksFilter() gained the parameter 'chkr', so the selected packs and the active flag can be retrieved from the checker instance. buildPacksFilter() has been modified, so there is a choice between 'normal' processing and snapshot filter processing. The printed messages were adopted accordingly. cmd/restic/cmd_check_integration_test.go: three new tests verifying the filtering. internal/checker/checker.go: seee above internal/checker/checker.go and internal/checker/testing.go had to be modified for using the changed LoadSnapshots() interface.
linter caught me out with an incorrect spelling
add subtest in cmd_check_integration_test.go: show that exaxctly one snapshot has been selected in Structure()
I have no idea whu Windows chokes in a different place from here. Timing?
cmd/restic/cmd_check.go: moved a lot of code from the command to internal/checker/checker inserted one message line "Snapshot filtering is active" internal/checker/checker.go: split the code from cmd/restic/cmd_check.go into parts for LoadSnapshots(), Structure() and a new wrapper for ReadPacks() cmd/restic/cmd_check_integration_test.go: adopted the tests to the new environment
linter caught me out again
cmd/restic/cmd_check.go: passing printer parameter to chkr.Structure() cmd/restic/cmd_check_integration_test.go internal/checker/checker_test.go internal/checker/testing.go adopting changes to chker.Structure() internal/checker/checker.go: now printing error message to printer.E
linter caught me out
remove these pesky fialing windows tests for the time being
changelog/unreleased/issue-3326: changed wording doc/077_troubleshooting.rst: clarified usage of `restic check` with SnapshotFilter internal/restorer/restorer_windows_test.go: reinstated original windows test cmd/restic/cmd_check.go: added word 'filtered ' for output display in the various messages internal/checker/checker.go: reworked checker logic according to recommendations, removed extra parameter `printer`in `Structure()`. added helper function `loadActiveTrees()` modified `ReadPacks()` to pass filtered packfiles to `repository.ReadPacks()`. cmd/restic/cmd_check_integration_test.go: internal/checker/checker_test.go: internal/checker/testing.go: adjusted parameter lists for modified functions.
cmd/restic/cmd_check.go: `buildPacksFilter()` received parameter `filteredStatus`, rearranged code internal/checker/checker.go: worked on `ReadPacks()`, rearranged code cmd/restic/cmd_check_integration_test.go: build a table of tests to run, together with the expeced results internal/checker/checker_test.go: internal/checker/testing.go: adapt code changes in internal/checker/checker.go
ed06afc to
38a987d
Compare
|
And a trivial rebase + fix to work with the changed datatype for |
This is the continuation of PR #5213. I decided to use a git provided e-mail address instead of my private e-mail address. All changes up-to 2025-03-07 have been incorporated. Details to follow here:
changelog/unreleased/issue-3326:
Introduction of snapshot based filter for command
restic checkcmd/restic/cmd_check.go:
Integration of snapshot based filtering. Some messages had to be changed so that the text does not refer to 'all packfiles' but to 'selected packfiles'.
cmd/restic/cmd_check_integration_test.go:
A new test for filtering via snapshots has been added.
doc/077_troubleshooting.rst:
A short paragraph in the manual has been added about snapshot based filtering.
internal/checker/checker.go:
A new func CheckWithSnapshots() has been added which collects the needed packfiles via restic.FindUsedBlobs().
internal/checker/checker_test.go:
A new test TestCheckRepoSnapshot() has been added to check out the new functionality.
What does this PR change? What problem does it solve?
This PR addresses snapshot based filters in command
restic check.Was the change previously discussed in an issue or on the forum?
#3326
#3326
Checklist
changelog/unreleased/that describes the changes for our users (see template).