Skip to content

fix: Refactor glob filters#488

Merged
kodiakhq[bot] merged 10 commits intomainfrom
fix/filter_glob
Dec 12, 2022
Merged

fix: Refactor glob filters#488
kodiakhq[bot] merged 10 commits intomainfrom
fix/filter_glob

Conversation

@yevgenypats
Copy link
Copy Markdown
Contributor

This simplifies our glob filtering and makes sure Tables are always stored in a tree like structured apart from when returned from FlattenTables. This solves a lot of potential bugs and places where we need to do things like if parent != nil.

I think this also closes #475 as there is no need for another data structure.

The reasoning for this is that our tables are tree like structured and it's easier imo to always keep it like this everywhere across the code and not have multiple data structures with similar methods and so on. This required a small update to our filtering logic which just works as a DFS and in-place filtering (for that to work I had to add a copy method which I think can be useful for other things in the future).

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 11, 2022

⏱️ Benchmark results

  • DefaultConcurrency-2 resources/s: 11,995
  • Glob-2 ns/op: 146.6
  • TablesWithChildrenDefaultConcurrency-2 resources/s: 28,005
  • BufferedScanner-2 ns/op: 9.297
  • LogReader-2 ns/op: 30.57

@yevgenypats
Copy link
Copy Markdown
Contributor Author

I removed some error validation from here but I can bring it back it just should go into spec.Validate rather then the search logic.

Copy link
Copy Markdown
Contributor

@hermanschaaf hermanschaaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I think this should do the trick. 👍

I would just like to make sure we don't leave a gap in our validation and testing:

  • we should have at least some basic test for TablesForSpec
  • we should bring back some of the validation that listAndValidateTables was doing, and the tests for it. It's fine if we put that in Validate, but it was checking for things like invalid table names (or globs that match nothing), which I think is a pretty useful user-facing feature

Copy link
Copy Markdown
Member

@erezrokah erezrokah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach, but I think we're missing a use case

for _, r := range t.Relations {
filteredChild := r.filterDfs(tables, skipTables)
if filteredChild != nil {
matched = true
Copy link
Copy Markdown
Member

@erezrokah erezrokah Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this always return the parent table if the child matches, for example:

               {
			name:                    "should skip parent table",
			tables:                  []*Table{{Name: "main_table", Relations: []*Table{{Name: "sub_table", Parent: &Table{Name: "main_table"}}}}},
			configurationTables:     []string{"*"},
			configurationSkipTables: []string{"main_table"},
			want:                    []string{},
		},

The above test fails as it returns main_table and sub_table.
Maybe we should return early if the parent doesn't match is skipped

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. it depends on what we define on what has preference. if skip has preference then yes. I've added that logic.

Copy link
Copy Markdown
Member

@erezrokah erezrokah Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I think https://github.com/cloudquery/plugin-sdk/pull/488/files#diff-7e84c216b71c17b23bc66ee33042fe73fc9352c152dbc4fb4cd9a94d74e2bfb4R49 is the reason

So I think we need to split matched into matched and skipped if we want to support both Your fix works too

yevgenypats and others added 2 commits December 12, 2022 11:41
Co-authored-by: Erez Rokah <erezrokah@users.noreply.github.com>
Co-authored-by: Erez Rokah <erezrokah@users.noreply.github.com>
@erezrokah
Copy link
Copy Markdown
Member

Also @yevgenypats may I suggest we do this as a fix so it triggers a release, and also release it separately?
I mean:

  1. Release chore(main): Release v1.12.0 #482
  2. Release all plugins with chore(main): Release v1.12.0 #482
  3. Then release this PR

We've had a bunch of bugs related to this logic, so maybe safer to do a dedicated release

@yevgenypats yevgenypats changed the title chore: Refactor glob filters fix: Refactor glob filters Dec 12, 2022
}
for _, skipTable := range skipTables {
if glob.Glob(skipTable, t.Name) {
return nil
Copy link
Copy Markdown
Member

@erezrokah erezrokah Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this I think we can do the skipTables for loop first and the tables later

Comment on lines +184 to +195
var matched bool
for _, includeTable := range tables {
if glob.Glob(includeTable, t.Name) {
matched = true
break
}
}
for _, skipTable := range skipTables {
if glob.Glob(skipTable, t.Name) {
return nil
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var matched bool
for _, includeTable := range tables {
if glob.Glob(includeTable, t.Name) {
matched = true
break
}
}
for _, skipTable := range skipTables {
if glob.Glob(skipTable, t.Name) {
return nil
}
}
for _, skipTable := range skipTables {
if glob.Glob(skipTable, t.Name) {
return nil
}
}
var matched bool
for _, includeTable := range tables {
if glob.Glob(includeTable, t.Name) {
matched = true
break
}
}

Copy link
Copy Markdown
Member

@erezrokah erezrokah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good now, added another blocking comment as I think we can exit early on skip

Co-authored-by: Herman Schaaf <hermanschaaf@gmail.com>
@kodiakhq kodiakhq bot merged commit cb5f6bb into main Dec 12, 2022
@kodiakhq kodiakhq bot deleted the fix/filter_glob branch December 12, 2022 10:45
erezrokah pushed a commit that referenced this pull request Dec 13, 2022
🤖 I have created a release *beep* *boop*
---


##
[1.12.1](v1.12.0...v1.12.1)
(2022-12-13)


### Bug Fixes

* Don't panic on empty-string for timestamp
([#489](#489))
([83813de](83813de))
* Fix deadlock off-by-one
([#493](#493))
([4ea9ed8](4ea9ed8))
* Reduce default concurrency
([#491](#491))
([f995da9](f995da9))
* Refactor glob filters
([#488](#488))
([cb5f6bb](cb5f6bb))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants