Skip to content

filter: ability to use negative patterns#2311

Merged
fd0 merged 7 commits intorestic:masterfrom
vincentbernat:feature/negative-pattern
Mar 20, 2022
Merged

filter: ability to use negative patterns#2311
fd0 merged 7 commits intorestic:masterfrom
vincentbernat:feature/negative-pattern

Conversation

@vincentbernat
Copy link
Copy Markdown
Contributor

@vincentbernat vincentbernat commented Jun 14, 2019

This is quite similar to gitignore. If a pattern is prefixed by an
exclamation mark and match a file that was previously matched by a
regular pattern, the match is cancelled. Notably, this can be used
with --exclude-file to cancel the exclusion of some files.

Like for gitignore, once a directory is excluded, it is not possible
to include files inside the directory. For example, a user wanting to
only keep *.c in some directory should not use:

~/work
!~/work/*.c

But:

~/work/*
!~/work/*.c

I didn't write documentation or changelog entry. I would like to get
feedback if this is the right approach for excluding/including files
at will for backups. I use something like this as an exclude file to
backup my home:

$HOME/**/*
!$HOME/Documents
!$HOME/code
!$HOME/.emacs.d
!$HOME/games
# [...]
node_modules
*~
*.o
*.lo
*.pyc
# [...]
$HOME/code/linux/*
!$HOME/code/linux/.git
# [...]

There are some limitations for this change:

  • Patterns are not mixed accross methods: patterns from file are
    handled first and if a file is excluded with this method, it's not
    possible to reinclude it with --exclude !something.

  • Patterns starting with ! are now interpreted as a negative
    pattern. I don't think anyone was relying on that.

  • The whole list of patterns is walked for each match. We may
    optimize later by exiting early if we know no pattern is starting
    with !.

Fix #233

Checklist

  • I have read the Contribution Guidelines
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

@danielsharvey
Copy link
Copy Markdown

Looks good, although at present I don't think it handles this case:

$HOME/**/*
!$HOME/local/bin

i.e. the current logic excludes $HOME/local and so the negate for $HOME/local/bin is missed.

This solution needs special handling of the childMayMatch combined with negate case.

I've worked some logic which appears to work and will add tomorrow. It needs to consider both exclude and include cases.

@vincentbernat
Copy link
Copy Markdown
Contributor Author

vincentbernat commented Jun 15, 2019

Yes, you are right, this is not handled correctly. It should be:

$HOME/**/*
!$HOME/local
$HOME/local/*
!$HOME/local/bin

Not very user-friendly. But on par with git (though for git, this kind of case is not very common).

@danielsharvey
Copy link
Copy Markdown

Understood, thanks, although I think highly preferable to handle this as per the two-line formulation. Thoughts?

Also, this PR does not currently handle restore presently (same filter code used in cmd_restore.go).

e.g. this should restore local/bin files starting with 'a' through 'm' but restores nothing

restic restore -v --include '/local/bin' --include '!/local/bin/[a-m]*' latest

@vincentbernat
Copy link
Copy Markdown
Contributor Author

vincentbernat commented Jun 16, 2019

For this example, yes. But what about:

$HOME/build
test.*
!test.c

Should we include $HOME/build/something/something/test.c (and then walk $HOME/build)? I mean, just having one unqualified/relative exception would mean we have to walk everything because it can be anywhere.

As for restore, I think you want --exclude '/local/bin/*' --exclude '!/local/bin/[a-m]*'. Except if you have already implemented your additional logic?

@danielsharvey
Copy link
Copy Markdown

Regarding general logic, I think you are right. The additional terms are better than the requirement to walk additional subtrees. Thanks.

In regards to the restore, I'm presuming you mean --include not --exclude. But in any case the modification of the term to /local/bin/* does not help.

I believe the issue is that the childMayMatch should not reset in the same way as the primary match term - it should retain the behaviour of triggering to true if any pattern indicates that a child may match.

So

childMayMatch = childMayMatch && !c || c && !negate

becomes

childMayMatch = childMayMatch || c && !negate

This works for my restore case. Thoughts?

@danielsharvey
Copy link
Copy Markdown

I also suggest another expansion of the filter tests is warranted - a few extra cases and capturing the childMayMatch return value in the tests. See danielsharvey@7cb1815

Thoughts?

@vincentbernat
Copy link
Copy Markdown
Contributor Author

You said e.g. this should restore local/bin files starting with 'a' through 'm' but restores nothing, but you have --include "!local/bin/[a-m]*", so it should not restore files starts with 'a' through 'm', right? So, I am a bit confused by the example.

But you are right, there is something different about restore. Just using a negative pattern is enough to make it fail. We should add tests for the this as there is currently none.

@danielsharvey
Copy link
Copy Markdown

You said e.g. this should restore local/bin files starting with 'a' through 'm' but restores nothing, but you have --include "!local/bin/[a-m]*", so it should not restore files starts with 'a' through 'm', right? So, I am a bit confused by the example.

Argh! Sorry - the test restore (that works with the change I suggest) is to exclude 'a' through 'm' and include 'n' through 'z'.

@danielsharvey
Copy link
Copy Markdown

But you are right, there is something different about restore. Just using a negative pattern is enough to make it fail. We should add tests for the this as there is currently none.

Yes. The tests I suggest in my commit danielsharvey@7cb1815 and danielsharvey@6a4938a that capture this and the other checks we've been discussing are:

{[]string{"/**/*", "!/foo", "/foo/*", "!/foo/bar"}, "/foo/other/test.go", true, true},
{[]string{"/**/*", "!/foo", "/foo/*", "!/foo/bar"}, "/foo/bar/test.go", false, false},
{[]string{"/**/*", "!/foo", "/foo/*", "!/foo/bar"}, "/foo/bar/test.go/child", false, false},
{[]string{"/**/*", "!/foo", "/foo/*", "!/foo/bar", "/foo/bar/test*"}, "/foo/bar/test.go/child", true, true},
{[]string{"/foo/bar/*", "!/foo/bar/[a-m]*"}, "/foo", false, true},
{[]string{"/foo/**/test.c"}, "/foo/bar/foo/bar/test.c", true, true},
{[]string{"/foo/*/test.c"}, "/foo/bar/foo/bar/test.c", false, false},

But, I still had not properly considered the childMayMatch case in the negation case. I think the logic needs to be:

matched = matched && !m || m && !negate

if negate {
  // a matching negation will 'cover' child matches
  if m && c {
    childMayMatch = false
  }
} else {
  childMayMatch = childMayMatch || c
}

@vincentbernat vincentbernat force-pushed the feature/negative-pattern branch from ec44001 to 5473bb5 Compare July 2, 2019 19:38
@vincentbernat
Copy link
Copy Markdown
Contributor Author

Hey!

Sorry for the late answer, I was a bit busy lately. So you were right. I have added a commit with tests for childMayMatch before the change and I have updated the commit for the change to use your logic (but I have formatted to make it look like the logic for updating match). I agree with the results of the tests. This seems to work now as expected.

@danielsharvey
Copy link
Copy Markdown

Thanks, looks good to me.

@vincentbernat
Copy link
Copy Markdown
Contributor Author

Thanks @danielsharvey for the review! I have added the documentation and the changelog entry. The PR should be ready for a formal review.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Jul 3, 2019

Codecov Report

Merging #2311 into master will decrease coverage by 4.25%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2311      +/-   ##
==========================================
- Coverage   51.09%   46.84%   -4.26%     
==========================================
  Files         178      178              
  Lines       14546    14550       +4     
==========================================
- Hits         7433     6816     -617     
- Misses       6042     6720     +678     
+ Partials     1071     1014      -57
Impacted Files Coverage Δ
internal/filter/filter.go 76.47% <100%> (+0.56%) ⬆️
internal/backend/b2/b2.go 0% <0%> (-80.69%) ⬇️
internal/backend/swift/swift.go 0% <0%> (-78.83%) ⬇️
internal/backend/gs/gs.go 0% <0%> (-74%) ⬇️
internal/backend/azure/azure.go 0% <0%> (-69.46%) ⬇️
internal/backend/swift/config.go 34.69% <0%> (-57.15%) ⬇️
internal/archiver/blob_saver.go 95.06% <0%> (-4.94%) ⬇️
internal/archiver/file_saver.go 84.21% <0%> (-4.39%) ⬇️
internal/checker/checker.go 66.59% <0%> (-3.67%) ⬇️
cmd/restic/cmd_check.go 42.4% <0%> (-0.55%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5bd5db4...86f53ab. Read the comment docs.

@danielsharvey
Copy link
Copy Markdown

@fd0 Is there any process that we should follow to progress PRs like this one? I have been using this for some time and would love to see it merged. Thanks!

@rawtaz
Copy link
Copy Markdown
Contributor

rawtaz commented Jun 27, 2020

Following up on the comments by @MichaelEischer would be the process, but in the end it boils down to the core developers' time constraints. There are many other changes that are being worked on, patience is needed :)

@vincentbernat vincentbernat force-pushed the feature/negative-pattern branch from 86f53ab to 45a979f Compare June 27, 2020 11:57
@vincentbernat
Copy link
Copy Markdown
Contributor Author

I have rebased the commit and use the second approach, suggested by @MichaelEischer

@danielsharvey
Copy link
Copy Markdown

Thanks @rawtaz, I appreciate the challenge of volunteer efforts. And apologies, I had missed that @MichaelEischer's comments had not been fully responded to (now done).

@rawtaz
Copy link
Copy Markdown
Contributor

rawtaz commented Jun 27, 2020

To anyone else; Has anyone else tested this in production? Please let us know :)

@danielsharvey
Copy link
Copy Markdown

danielsharvey commented Jun 27, 2020

I'm not "anyone else", but just to add clarity to my use cases (in regular use by me):

  1. I've attached one of my standard backups, which runs off restic ... --exclude-file ~/restic-excludes-negative ~/

  2. I also have a continuous backup script which uses a file watcher to detect filesystem updates, and then run a differential backup every 15 minutes based upon a dynamically generated exclude file similiar to 1 above.

Attachment: restic-excludes-negative

@vincentbernat
Copy link
Copy Markdown
Contributor Author

If there is still hope to get this merged, I can try another rebase...

Copy link
Copy Markdown
Member

@fd0 fd0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for the long delay, I've talked with @MichaelEischer and we're inclined to include this PR. Can you please rebase again?

I've not looked at the code too deeply, but I've seen two mistakes in the documentation and changelog.

@vincentbernat
Copy link
Copy Markdown
Contributor Author

I did have a look, but there are too many changes to easily rebase. If anyone wants to give it a try, they are welcome to take over this PR.

@fd0
Copy link
Copy Markdown
Member

fd0 commented Aug 21, 2021

I understand, thanks for your work!

@MichaelEischer MichaelEischer force-pushed the feature/negative-pattern branch from 45a979f to 689979d Compare September 17, 2021 21:20
Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fd0 I've rebased the PR. The check whether a pattern is negated now occurs once while preparing the pattern. That required some refactoring, in separate commits, to change the Pattern type from a slice to a struct.

@x80486

This comment has been minimized.

@rawtaz

This comment has been minimized.

@danielsharvey
Copy link
Copy Markdown

@MichaelEischer @fd0 Thanks for all the work. Just checking in - any assistance required to finalise this?

@vincentbernat vincentbernat requested a review from fd0 January 10, 2022 05:39
vincentbernat and others added 6 commits March 20, 2022 13:33
This is quite similar to gitignore. If a pattern is suffixed by an
exclamation mark and match a file that was previously matched by a
regular pattern, the match is cancelled. Notably, this can be used
with `--exclude-file` to cancel the exclusion of some files.

Like for gitignore, once a directory is excluded, it is not possible
to include files inside the directory. For example, a user wanting to
only keep `*.c` in some directory should not use:

    ~/work
    !~/work/*.c

But:

    ~/work/*
    !~/work/*.c

I didn't write documentation or changelog entry. I would like to get
feedback if this is the right approach for excluding/including files
at will for backups. I use something like this as an exclude file to
backup my home:

    $HOME/**/*
    !$HOME/Documents
    !$HOME/code
    !$HOME/.emacs.d
    !$HOME/games
    # [...]
    node_modules
    *~
    *.o
    *.lo
    *.pyc
    # [...]
    $HOME/code/linux/*
    !$HOME/code/linux/.git
    # [...]

There are some limitations for this change:

 - Patterns are not mixed accross methods: patterns from file are
   handled first and if a file is excluded with this method, it's not
   possible to reinclude it with `--exclude !something`.

 - Patterns starting with `!` are now interpreted as a negative
   pattern. I don't think anyone was relying on that.

 - The whole list of patterns is walked for each match. We may
   optimize later by exiting early if we know no pattern is starting
   with `!`.

Fix restic#233
@fd0 fd0 force-pushed the feature/negative-pattern branch from 689979d to 53656f0 Compare March 20, 2022 12:34
Copy link
Copy Markdown
Member

@fd0 fd0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your patience, I'm approving this PR!

@fd0 fd0 merged commit 3a285f9 into restic:master Mar 20, 2022
juanrgm added a commit to swordev/datatruck that referenced this pull request May 17, 2022
@cowwoc
Copy link
Copy Markdown

cowwoc commented Jun 21, 2022

Does the new exclamation mark functionality work for inclusions using --files-from? The documentation only mentions using it for exclusions, not inclusions.

@MichaelEischer
Copy link
Copy Markdown
Member

No, the negative patterns are only available for exclusions. A negative pattern in files-from would actually be an exclude, so it should just work to specify it as such?

@cowwoc
Copy link
Copy Markdown

cowwoc commented Jul 2, 2022

@MichaelEischer Are we able to represent any sort of ordering with this approach?

Meaning, can I specify the following?

  • Exclude /foo but
  • Include /foo/one, /foo/two

If I remember correctly, the exclusion of the parent directory will override any inclusion of the children directories.

@danielsharvey
Copy link
Copy Markdown

@cowwoc You absolutely can.

My backup selectively backs up using this mechanism:

restic <...> --exclude-file ~/backup/readme/restic-excludes-negative ~/

My excludes file looks like this:

/*
!/Users
/Users/*
!/Users/daniel

/$HOME/*

!$HOME/Desktop
!$HOME/Documents
!$HOME/Downloads

# ...

Note the pattern:

  1. Exclude everything
  2. "Unexclude" the folder you want to include
  3. Exclude everything (all contents - *) in that folder
  4. Unexclude the contents of the folder that you want to include
  5. Repeat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow mixing include and exclude patterns for backup and restore

8 participants