feat(gazelle): Add "python_test_file_pattern" directive#1819
feat(gazelle): Add "python_test_file_pattern" directive#1819aignas merged 26 commits intobazel-contrib:mainfrom
Conversation
Add glob to generate.go Add glob dep
gazelle/python/generate.go
Outdated
| func makeCompiledGlobs(globs []string) []glob.Glob { | ||
| compiledGlobs := []glob.Glob{} | ||
| for _, value := range globs { | ||
| compiledGlob := glob.MustCompile(value) |
There was a problem hiding this comment.
I might prefer a more helpful error message if the glob fails. Could we add an integration test? Maybe it is good enough already? Right now this code would panic
There was a problem hiding this comment.
- Failure to compile: error message added, test added
I had a similar thought last night: we also need to handle the case where the user has # gazelle:python_test_file_pattern (with no value). Right now no files will be mapped to py_test if there is no pattern. If we want to keep that behavior, I'll need to update the docs.
dougthor42
left a comment
There was a problem hiding this comment.
I also added a test that subpackages can override the value.
gazelle/python/generate.go
Outdated
| func makeCompiledGlobs(globs []string) []glob.Glob { | ||
| compiledGlobs := []glob.Glob{} | ||
| for _, value := range globs { | ||
| compiledGlob := glob.MustCompile(value) |
There was a problem hiding this comment.
- Failure to compile: error message added, test added
I had a similar thought last night: we also need to handle the case where the user has # gazelle:python_test_file_pattern (with no value). Right now no files will be mapped to py_test if there is no pattern. If we want to keep that behavior, I'll need to update the docs.
| case pythonconfig.TestFilePattern: | ||
| globStrings := strings.Split(strings.TrimSpace(d.Value), ",") | ||
| config.SetTestFilePattern(globStrings) |
There was a problem hiding this comment.
In #1787 I added an option for resetting the directive to the rules_python default. Do we want to support the same for this directive?
// N.B.: pseudocode; haven't tested yet
case pythonconfig.TestFilePattern:
directiveArg := strings.TrimSpace(d.Value)
if directiveArg == "DEFAULT" {
directiveArg = pythonconfig.DefaultTestFilePatternString
}
globStrings := strings.Split(directiveArg, ",")
config.SetTestFilePattern(globStrings)There was a problem hiding this comment.
For consistency I would agree, but at the same time my brain is also telling me that we could have:
gazelle:python_test_file_pattern
to reset the config to defaults.
The empty pattern yielding no test targets is not super intuitive, IMHO.
What do you think?
There was a problem hiding this comment.
The empty pattern yielding no test targets is not super intuitive, IMHO.
Agreed.
That said, I would argue that the empty pattern resetting to defaults is also not super intuitive. Some readers will see that and think "no test targets" while others will think "resets to default".
I like explicit things 😁. There's no ambiguity and it's easier to grok when returning to something N years down the road.
If we wanted to support yielding no test targets1 it should be something like gazelle:python_test_file_pattern NONE or have a different directive altogether (gazelle:python_no_tests or something).
Footnotes
-
I can't think of a reason why someone would want that, but that doesn't mean there aren't reasons ↩
There was a problem hiding this comment.
Passerby opinion: yielding no test targets can (and imho should) be achieved by ignoring files with gazelle:python_ignore_files. That said, setting pattern to NONE should not be allowed (also, it's a bit confusing to set pattern to NONE, some might thing there is NONE file or smths). gazelle:python_test_file_pattern (blank) should also fail.
There was a problem hiding this comment.
yielding no test targets can (and imho should) be achieved by ignoring files with
gazelle:python_ignore_files
I disagree. There are definitely cases where you want to yield no test targets but do want to yield lib and binary targets.
project/
+ BUILD.bazel
+ MODULE.bazel
+ src/myfoo/
+ BUILD.bazel
+ foo.py
+ foo_test.py
+ subdir/
+ BUILD.bazel
+ run_test.py # py_binary
+ example_test.py # py_library
Using gazelle:python_ignore_files would mean that no targets are generated in subdir above, when really we'd want only py_library and py_binary targets made.
There's also the case where test files are outside of the package, as described by PyPA Tutorial. In that case, you can be sure that all targets in src/mypackage below are py_library and py_binary.
project/
+ BUILD.bazel
+ MODULE.bazel
+ src/mypackage/ # guaranteed to have no py_test targets
+ BUILD.bazel
+ run_electrical_test.py # py_binary
+ electrical_test.py # py_library
+ tests/ # any py_* targets
+ BUILD.bazel
+ foo_test.py # py_test
+ bar_test.py # py_test
+ test_utils.py # py_library
+ run_all_test.py # py_binary
also, it's a bit confusing to set pattern to
NONE, some might thing there isNONEfile
A valid point. Perhaps using something that is unlikely to be a filename would be better? Maybe one of these or similar?
# gazelle:python_test_file_pattern %NONE%
# gazelle:python_test_file_pattern !!NO_FILES
# gazelle:python_test_file_pattern ~!~NO_PATTERN~!~One issue is that linux filenames can be basically anything haha so it's not feasible to pick a string that can't be a filename.
If we do go with a more "special" string, I'd probably want to update the python_default_visibility directive special strings to match.
gazelle:python_test_file_pattern(blank) should also fail.
+1 to that.
There was a problem hiding this comment.
Agreed with arguments around gazelle:python_ignore_files.
How about...not having any magic names at all? I mean, by the time someone wants to reset the test file patterns, he/she already learned what default values are (while customising them in the first place), and if resetting the patterns is needed, it can be done with # gazelle:python_test_file_pattern *_test,test_*.
Yet another idea is to have dedicated no-args directive for resetting to default, something like:
# gazelle:default_python_test_file_pattern. However, it only makes sense if applied conventionally everywhere in the plugin imho. Otherwise, I'd vote for not having any special values and let users "reset" by explicitly setting python_test_file_pattern to default values.
There was a problem hiding this comment.
I'm good with no magic names at all and a "reset" by explicitly setting the value to defaults. gazelle:python_test_file_pattern *_test.py,test_*.py
That said, I'd still like to mention the two (minor) issues with no magic values:
- [can ignore] There's no way to defer to
rules_python's defaults. For example, ifrules_pythonchanges the default from*_test.py,test_*.pyto something else, users would have to update their configs manually if they wanted to always use therules_pythondefault.- TBH this is probably a good thing because it results in less unexpected behavior.
- We're back to the
# gazelle:python_test_file_pattern(no value) for preventing any files from mapping topy_test. As mentioned previously:- this is not very intuitive.
- we said that "no value" should fail.
For (2), perhaps just making sure that's documented and tested is sufficient? Unless we implement another directive gazelle:python_no_test_files or something (which I'm against - it seems to only complicate things).
have dedicated no-args directive for resetting to default ... it only makes sense if applied conventionally everywhere in the plugin
+1. I'm slightly unsure about having directives to reset to default (as you said, people can just set the default manually), but if we add them they definitely should be applied everywhere.
There was a problem hiding this comment.
As far as I can see, the plugin currently has no switches for enabling/disabling parts of the generation process. Setting pattern to none sounds to me like an abuse of the directive to disable tests and I'd rather have an explicit python_no_test_files so that there is a clear indication no tests are being processed for a) confusing fewer humans, and b) plugin itself, so that it can potentially deviate its processing depending on that fact (e.g. print debug/warning message etc).
This should be, however, out of context for this PR. I'd say, we should just disallow no-arg python_test_file_pattern, provide no magic values and wrap-up the PR.
Have I already mentioned this is great work and nice improvement 🙂? Thank you! Looking forward to it. I have exactly the place where I will use it already.
There was a problem hiding this comment.
Setting pattern to none sounds to me like an abuse of the directive
I can see that and it's a good point.
I'd rather have an explicit
python_no_test_filesso that there is a clear indication no tests are being processed ... we should just disallow no-argpython_test_file_pattern, provide no magic values and wrap-up the PR.
SGTM. I'd like to continue the discussion, so I've opened #1826.
this is great work and nice improvement
Thanks! I've been really enjoying being able to add more features.
There was a problem hiding this comment.
All this SGTM, thanks for the discussion. I've been enjoying the read! :)
dougthor42
left a comment
There was a problem hiding this comment.
I've added the test case for "no value causes error" and updated the docs for this directive. I think we're ready for re-review.
gazelle/go.mod
Outdated
| github.com/bmatcuk/doublestar v1.3.4 | ||
| github.com/emirpasic/gods v1.18.1 | ||
| github.com/ghodss/yaml v1.0.0 | ||
| github.com/gobwas/glob v0.2.3 |
There was a problem hiding this comment.
This seems to be the most popular glob package, though I don't see any github activity in 6 years. Other options were:
There was a problem hiding this comment.
How about doublestar.Match? It is used by bazel-gazelle and is already included in this go.mod:9 too.
import (
"fmt"
"github.com/bmatcuk/doublestar/v4"
)
func matchesAnyGlob(s string, globs []string) bool {
for _, g := range globs {
if ok, _ := doublestar.Match(g, s); ok {
return true
}
}
return false
}
func main() {
defaultGlob := []string{"*_test.py", "test_*.py"}
fmt.Println(matchesAnyGlob("test_my_target.py", defaultGlob))
fmt.Println(matchesAnyGlob("my_target_test.py", defaultGlob))
customGlob := []string{"*_pytest.py", "pytest_*.py"}
fmt.Println(matchesAnyGlob("pytest_upload.py", customGlob))
fmt.Println(matchesAnyGlob("download_pytest.py", customGlob))
}There was a problem hiding this comment.
+1 for not adding another dep! Updated.
dougthor42
left a comment
There was a problem hiding this comment.
removed gobwas/glob and migrated/upgraded to doublestar/v4.
gazelle/go.mod
Outdated
| github.com/bmatcuk/doublestar v1.3.4 | ||
| github.com/emirpasic/gods v1.18.1 | ||
| github.com/ghodss/yaml v1.0.0 | ||
| github.com/gobwas/glob v0.2.3 |
There was a problem hiding this comment.
+1 for not adding another dep! Updated.
wingsofovnia
left a comment
There was a problem hiding this comment.
Thank you for the great work!
gazelle/python/generate.go
Outdated
|
|
||
| func matchesAnyGlob(s string, globs []string) bool { | ||
| for _, g := range globs { | ||
| ok, err := doublestar.Match(g, s) |
There was a problem hiding this comment.
I am slightly concerned that we are compiling the glob here. Would it be possible to compile it elsewhere? When parsing the config would be another option.
There was a problem hiding this comment.
Given that this is used in larger repos I am worried that we perform a linear number of glob compilations as opposed to doing it once when loading the config.
There was a problem hiding this comment.
Agreed. We were doing that before1 with gobwas/glob. From what I can tell, doublestar doesn't have any kind of "compiled pattern" object that I can pass around instead.
We can validate patterns during config load, but we wouldn't be able to tell doublestart.Match to not validate patterns and thus save some cycles. (Unless there's a way to access matchWithSeparator? My understanding of go is that lowercase names aren't exported and are thus not accessible.)
I think we have two (three?) options:
- Go back to
gobwas/globand precompile patterns - Accept the perf hit.
- Some go ability that I'm not aware of?
Either way we should validate/compile patterns when loading the config so that we fail earlier. I've made that update. It means we validate one more time, but we're already validating (N_files * Patterns) times so (N*P)+1 is no big deal
Footnotes
-
Well, almost. It wasn't done when loading config, but at least it was outside of the
for _, f := range args.RegularFilesloop. ↩
There was a problem hiding this comment.
Looking at the match's code there is nothing to "compile" in glob really, like we typically do with regex. This is common for globs to be matched directly as they are simple, do not need complicated parsing, creating underlying tree used for matching etc.
This [rules_python] repo already does doublestart.Match in a loop here and so does bazel-gazelle here and it hasn't been an issue so far.
Validating earlier is a good improvement.
Unless there's a way to access matchWithSeparator? My understanding of go is that lowercase names aren't exported and are thus not accessible.
That is correct for go. I created a ticket bmatcuk/doublestar#92
…well, in addition to) during generate.GenerateRules
gazelle/python/generate.go
Outdated
|
|
||
| func matchesAnyGlob(s string, globs []string) bool { | ||
| for _, g := range globs { | ||
| ok, err := doublestar.Match(g, s) |
There was a problem hiding this comment.
Agreed. We were doing that before1 with gobwas/glob. From what I can tell, doublestar doesn't have any kind of "compiled pattern" object that I can pass around instead.
We can validate patterns during config load, but we wouldn't be able to tell doublestart.Match to not validate patterns and thus save some cycles. (Unless there's a way to access matchWithSeparator? My understanding of go is that lowercase names aren't exported and are thus not accessible.)
I think we have two (three?) options:
- Go back to
gobwas/globand precompile patterns - Accept the perf hit.
- Some go ability that I'm not aware of?
Either way we should validate/compile patterns when loading the config so that we fail earlier. I've made that update. It means we validate one more time, but we're already validating (N_files * Patterns) times so (N*P)+1 is no big deal
Footnotes
-
Well, almost. It wasn't done when loading config, but at least it was outside of the
for _, f := range args.RegularFilesloop. ↩
gazelle/pythonconfig/pythonconfig.go
Outdated
| func (c *Config) SetTestFilePattern(patterns []string) { | ||
| for _, p := range patterns { | ||
| if !doublestar.ValidatePattern(p) { | ||
| log.Fatalf("ERROR: Failed to compile glob '%v'. Error: syntax error in pattern\n", p) |
There was a problem hiding this comment.
globstar errors with path.ErrBadPAttern but also states "users should not count on the returned error, doublestar.ErrBadPattern, being equal to path.ErrBadPattern" so I've opted to inject the string manually.
There was a problem hiding this comment.
Actually... I could just do:
// We only care if the pattern is valid. Use `Match` instead of `ValidatePattern` (which
// just returns `bool`) so that we always use the error message provided by `doublestar`.
if _, err := doublestar.Match(p, ""); err != nil {
log.Fatalf("ERROR: Failed to compile glob '%v'. Error: %v\n", p, err)
}
...Any preference between
(a) using ValidatePattern and manually injecting the error string as I've already done or
(b) abusing Match?
There was a problem hiding this comment.
I guess the word "abusing" in your comment speaks for itself :) Looking at the Match's code, err.ErrBadPattern brings no more than just an error flag (no error message, no extra context etc.) so it is effectively does the same as having true/false. I'd keep it as is. Except maybe a nit regarding go error message style (see the comment in configure.go).
| log.Fatalf("ERROR: Failed to compile glob '%v'. Error: syntax error in pattern\n", p) | |
| log.Fatalf("invalid glob pattern '%s'", p) |
There was a problem hiding this comment.
👍 kept, aside from style nit
gazelle/pythonconfig/pythonconfig.go
Outdated
| // Will exit 1 if the pattern is invalid. | ||
| func (c *Config) SetTestFilePattern(patterns []string) { | ||
| for _, p := range patterns { | ||
| if !doublestar.ValidatePattern(p) { |
There was a problem hiding this comment.
Would this check be better in configure.go case pythonconfig.TestFilePattern:?
There was a problem hiding this comment.
I'd say it would be slightly more aligned with existing code to do to in a case indeed as there are already some invalid value errors and parsing there while these stay simple setters.
|
Ah, thanks for checking. Then I am happy to merge everything! :)
…On 4 April 2024 17:38:05 GMT+09:00, Illia Ovchynnikov ***@***.***> wrote:
@wingsofovnia commented on this pull request.
> @@ -54,6 +54,19 @@ func GetActualKindName(kind string, args language.GenerateArgs) string {
return kind
}
+func matchesAnyGlob(s string, globs []string) bool {
+ for _, g := range globs {
+ ok, err := doublestar.Match(g, s)
Looking at the match's [code](https://github.com/bmatcuk/doublestar/blob/180028ba525d3116cc917ac3179e79428acfd89c/match.go#L74) there is nothing to "compile" in glob really, like we typically do with regex. This repo already does `double start.Match` in the loop [here](https://github.com/bazelbuild/rules_python/blob/b9f39bf09760ff75aedd64ce99c9573c1a97068f/gazelle/python/generate.go#L188) and so does bazel-gazelle [here](https://github.com/bazelbuild/bazel-gazelle/blob/1b331b6d9cb2d981580a8d3e2dfc551185790737/walk/config.go#L145) and it hasn't been an issue so far.
--
Reply to this email directly or view it on GitHub:
#1819 (comment)
You are receiving this because you modified the open/close state.
Message ID: ***@***.***>
|
gazelle/python/configure.go
Outdated
| case pythonconfig.TestFilePattern: | ||
| value := strings.TrimSpace(d.Value) | ||
| if value == "" { | ||
| log.Fatal("ERROR: Directive 'python_test_file_pattern' requires a value.") |
There was a problem hiding this comment.
nit: there is no need to double severity in the message here as Fatal will be already in the log. Also, go's error messages are conventionally lowercased since errors can be wrapped/nested (e.g. something bad happened: directive 'python...). \n is also unnecessary.
| log.Fatal("ERROR: Directive 'python_test_file_pattern' requires a value.") | |
| log.Fatal("directive 'python_test_file_pattern' requires a value.") |
There was a problem hiding this comment.
TIL, thanks! Updated.
As an FYI, the ERROR: and \n were included to maintain consistency with the other log.Fatal messages:
$ rg 'log.Fatal.?\("'
manifest/generate/generate.go
90: log.Fatalf("ERROR: %v\n", err)
109: log.Fatalf("ERROR: %v\n", err)
pythonconfig/pythonconfig.go
444: log.Fatalf("ERROR: Failed to compile glob '%v'. Error: syntax error in pattern\n", p)
python/lifecycle.go
41: log.Fatalf("failed to write parser zip: %v", err)
47: log.Fatalf("cannot write %q: %v", helperPath, err)
python/generate.go
238: log.Fatalf("ERROR: %v\n", err)
319: log.Fatalf("ERROR: %v\n", err)
353: log.Fatalf("ERROR: %v\n", err)
384: log.Fatalf("ERROR: %v\n", err)
504: log.Fatalf("ERROR: %v\n", err)
python/configure.go
188: log.Fatal("ERROR: Directive 'python_test_file_pattern' requires a value.")Prior to this PR, 7 of 9 calls included both ERROR: and \n.
Add the
python_test_file_patterndirective. This directive allowsusers to configure what python files get mapped to the
py_testrule.
The default behavior is unchanged: both
test_*and*_test.pyfiles generate
py_testtargets if the directive is unset.The directive supports multiple glob patterns, separated by a comma.
Note: The original code used, effectively,
test_*for one of thepatterns. This code uses
test_*.pyinstead. These are equivalentbecause of the
.pyextension check prior to pattern matching.Fixes #1816.