Allow mixing set-based and regexp-based include and exclude#63325
Conversation
|
There is one thing I'm a bit unsure of, around the stream serialization, and which version to set. I set the next major version (8.0.0) as default but I guess if this is merged in a 7.x branch this would need changing. Also, let's say we wanted to backport this feature to our own build of ES v6, do you think it could be done? |
|
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
|
@elasticmachine test this please |
|
@elasticmachine, ok to test |
| @Override | ||
| public boolean accept(BytesRef value) { | ||
| return ((valids == null) || (valids.contains(value))) && ((invalids == null) || (!invalids.contains(value))); | ||
| if (valids != null && !valids.contains(value)) { |
There was a problem hiding this comment.
| if (valids != null && !valids.contains(value)) { | |
| if (valids != null && (valids.contains(value) == false)) { |
Elastic coding standard prefers this form for readability
| return false; | ||
| } | ||
|
|
||
| if (runAutomaton != null && !runAutomaton.run(value.bytes, value.offset, value.length)) { |
There was a problem hiding this comment.
| if (runAutomaton != null && !runAutomaton.run(value.bytes, value.offset, value.length)) { | |
| if (runAutomaton != null && (runAutomaton.run(value.bytes, value.offset, value.length) == false)) { |
| } | ||
|
|
||
| public IncludeExclude(RegExp include, RegExp exclude, SortedSet<BytesRef> includeValues, SortedSet<BytesRef> excludeValues) { | ||
| if (include == null && exclude == null && includeValues == null && excludeValues == null) { |
There was a problem hiding this comment.
I think the intention here is that at most one of (include, includeValues) and at most one of (exclude, excludeValues) will be non-null. In other words, while you can mix set-based includes and regex excludes (or vice versa), you can't have both set-based and regex-based includes. That seems like a requirement of the precedence rules, among other things.
I think we should enforce that rule here. I know the parser doesn't currently allow for specifying both a regex and a set at the same time, but it's ultimately this class's contract that both not be set, and this class should check it.
There was a problem hiding this comment.
I think that in IncludeExclude any combination of the 4 (include, includeValues, exclude, excludeValues) can work correctly. Meaning that we can have both kinds of includes and/or both kind of excludes and it would "do the right/logical thing", i.e. it would accept terms that are in any of the include(s) but not in any of the exclude(s). I don't think there is precedence between both kinds of includes, nor between both kinds of excludes, just between include(s) and exclude(s).
Also, I don't think restricting to a single include and a single exclude would be more efficient (e.g. allow optimizations in the accept functions).
That's why I didn't forbid having both types of includes or both types of excludes. But I can definitely add a check to forbid it.
There was a problem hiding this comment.
The case I am worried about, if you have something like include = "foo.*" and includeValues = ["bar", "quux"], it will incorrectly reject the term "foo" by returning false on line 211, I think. I don't think we need to support that case, but we do need to explicitly reject it, by not allowing both include and includeValues to be set.
There was a problem hiding this comment.
You're right, I was wrong in my previous comment. I think the accept from the ordinals filter can work with both kinds of includes (or excludes), but indeed not the string filter, where I assumed there was only one kind of each.
I've added a check to forbid this
| aggregationBuilder = new TermsAggregationBuilder("_name").userValueTypeHint(ValueType.STRING) | ||
| .executionHint(executionHint) | ||
| .includeExclude(new IncludeExclude("val00.+", null, null, | ||
| new String[]{"val001", "val002", "val003", "val004", "val005", "val006", "val007", "val008"})) |
There was a problem hiding this comment.
I appreciate that you added tests to make sure that the precedence is preserved with the new possible configurations, but I think we should test this on IncludeExcludeTests as well.
There was a problem hiding this comment.
I've refactored/added tests for all possible combinations of include/exclude in IncludeExcludeTests.
server/src/main/java/org/elasticsearch/search/aggregations/bucket/terms/IncludeExclude.java
Show resolved
Hide resolved
|
I think the BWC failure is related to us cutting a branch around the time this test ran, I didn't see anything that looked related to this change. When you make the requested changes, we'll run the whole suite again and if BWC passes then, that's fine. |
All right. Do you want me to merge master into my branch? Or rebase onto it? |
|
This looks great, thanks. Yeah, if you could merge master into the branch, that should fix the BWC tests. I'm OOO tomorrow, but hopefully we can get this merged Monday or Tuesday. |
|
I merged master. However, BWC is still failing on my machine, but actually it's even failing for the master branch. It looks like other recent PRs are also failing the same tests though. We'll see how it goes with the CI but I'm not very hopeful. Update: OK so BWC passed in the CI but another task failed. It seems to me that it's not related to my changes. I tried reproducing the two failing tests on my machine with the command lines given in the log outputs, and they both pass successfully. I'm not sure what to do now |
|
I agree that test failure doesn't look related. I'll look into it. |
|
@elasticmachine run elasticsearch-ci/1 |
|
One of those is a known issue: #63719. I'm rerunning the suite to see if a different random seed fixes it. I'll dig in a bit more on the second failure and open an issue if necessary. |
|
This should be all set. I'll get it merged and backported this week. Thanks for coding it up! |
|
Awesome! Thank you for the review! |
…63325) * Allow mixing set-based and regexp-based include and exclude * Coding style * Disallow having both set and regexp include (resp. exclude) * Test correctness of every combination of include/exclude
This PR adds the feature discussed in #62246
It allows mixing set-based and regexp-based "include" and "exclude" parameters in Terms (and SignificantTerms, RareTerms) aggregations. Both ways are supported: a set include with a regexp exclude, or a regexp include with a set exclude.