[ES|QL] Combine Disjunctive CIDRMatch#111501
Conversation
|
Hi @fang-xing-esql, I've created a changelog YAML for you. |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
bpintea
left a comment
There was a problem hiding this comment.
A bit on the fence about reusing/extending this rule, but it does make some sense as "generic" OR-combiner; and then also rename to drop the ...ToIn.
...src/test/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctionsToInTests.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/ip.csv-spec
Outdated
Show resolved
Hide resolved
...esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctionsToIn.java
Outdated
Show resolved
Hide resolved
...esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctionsToIn.java
Outdated
Show resolved
Hide resolved
astefan
left a comment
There was a problem hiding this comment.
I am ok re-using this rule to also cover CIDRs. The final result of these combinations (either == or IN or CIDR) have a common result - the use of a terms query.
But:
- the javadoc of the class must be updated to reflect the new reality
- other comments in the class itself need update or new comments added
- an additional test I think it's needed in
CombineDisjunctionsToInTeststhat cover more complex queries:WHERE host == "alpha" OR host == "gamma" OR CIDR_MATCH(ip1, "127.0.0.2/32") OR CIDR_MATCH(ip1, "127.0.0.3/32") OR card IN ("eth0", "eth1") OR card == "lo0" OR CIDR_MATCH(ip0, "127.0.0.1") OR CIDR_MATCH(ip0, "fe80::cae2:65ff:fece:feb9") OR host == "beta" - use the same test as above in the
csv-specfile - look into why the query above is translated into
{
"bool": {
"should": [
{
"esql_single_value": {
"field": "host",
"next": {
"terms": {
"host": [
"alpha",
"gamma",
"beta"
],
"boost": 1
}
},
"source": "host@1:20"
}
},
{
"esql_single_value": {
"field": "card",
"next": {
"terms": {
"card": [
"eth0",
"eth1",
"lo0"
],
"boost": 1
}
},
"source": "card@1:128"
}
},
{
"bool": {
"should": [
{
"esql_single_value": {
"field": "ip1",
"next": {
"terms": {
"ip1": [
"127.0.0.2/32",
"127.0.0.3/32"
],
"boost": 1
}
},
"source": "ip1@1:69"
}
},
{
"esql_single_value": {
"field": "ip0",
"next": {
"terms": {
"ip0": [
"127.0.0.1",
"fe80::cae2:65ff:fece:feb9"
],
"boost": 1
}
},
"source": "ip0@1:184"
}
}
],
"boost": 1
}
}
],
"boost": 1
}
}
There is an additional bool query inside but I don't think it should be there (there should be only one bool query with 4 should statements).
- this query has a strange behavior, maybe we can improve it:
WHERE host == "alpha" OR host == "gamma" OR ip1 IN ("127.0.0.2/32"::ip) OR CIDR_MATCH(ip1, "127.0.0.3/32") OR card IN ("eth0", "eth1") OR card == "lo0" OR ip0 IN ("127.0.0.1"::ip) OR CIDR_MATCH(ip0, "fe80::cae2:65ff:fece:feb9") OR host == "beta"
is translated into (ip0 and ip1 are not included in the same terms query, even though they could; yes, they are coming from ip0 IN .... and cidr_match(ip0...) but the resulting queries could be combined in a better way):
{
"bool": {
"should": [
{
"esql_single_value": {
"field": "card",
"next": {
"terms": {
"card": [
"eth0",
"eth1",
"lo0"
],
"boost": 1
}
},
"source": "card@1:124"
}
},
{
"esql_single_value": {
"field": "ip0",
"next": {
"term": {
"ip0": {
"value": "127.0.0.1"
}
}
},
"source": "ip0@1:169"
}
},
{
"esql_single_value": {
"field": "host",
"next": {
"terms": {
"host": [
"alpha",
"gamma",
"beta"
],
"boost": 1
}
},
"source": "host@1:20"
}
},
{
"bool": {
"should": [
{
"esql_single_value": {
"field": "ip1",
"next": {
"terms": {
"ip1": [
"127.0.0.3/32"
],
"boost": 1
}
},
"source": "ip1@1:100"
}
},
{
"esql_single_value": {
"field": "ip0",
"next": {
"terms": {
"ip0": [
"fe80::cae2:65ff:fece:feb9"
],
"boost": 1
}
},
"source": "ip0@1:208"
}
}
],
"boost": 1
}
}
],
"boost": 1
}
}
There was a problem hiding this comment.
Generally, this looks good to me. Bogdan pointed out some good additional edge cases that I also think we should add tests for.
One thing I think would be great is tests that show that the combined CIDR_MATCH now gets pushed down to Lucene; I added one in #105061, it'd be great to have another one like that that comes with different kinds of disjunctions (and conjunctions, for good measure) of CIDR_MATCHes.
I think previously, we were not able to push down multiple CIDR_MATCHes (or at least, not as a single terms query), but combined this should now be possible.
...esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctionsToIn.java
Outdated
Show resolved
Hide resolved
|
Thanks to @astefan @bpintea and @alex-spies for reviewing! These are really good test cases!! The disjunctive or(tree) we build is not a left or right deep tree, it is a balance tree in stead, The |
Thank you for catching these interesting problems! Both queries are added to either |
astefan
left a comment
There was a problem hiding this comment.
LGTM
Left only few minor remarks.
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/planner/QueryTranslatorTests.java
Show resolved
Hide resolved
...gin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctions.java
Outdated
Show resolved
Hide resolved
...gin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctions.java
Show resolved
Hide resolved
...gin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctions.java
Show resolved
Hide resolved
...gin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctions.java
Show resolved
Hide resolved
alex-spies
left a comment
There was a problem hiding this comment.
LGTM! I have a minor testing suggestion, but this is already good as-is IMO.
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/planner/QueryTranslatorTests.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/planner/QueryTranslatorTests.java
Show resolved
Hide resolved
...gin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/CombineDisjunctions.java
Outdated
Show resolved
Hide resolved
|
Thank you for the iterations @fang-xing-esql ! |
* upstream/main: (22 commits) Prune changelogs after 8.15.0 release Bump versions after 8.15.0 release EIS integration (elastic#111154) Skip LOOKUP/INLINESTATS cases unless on snapshot (elastic#111755) Always enforce strict role validation (elastic#111056) Mute org.elasticsearch.xpack.esql.analysis.VerifierTests testUnsupportedAndMultiTypedFields elastic#111753 [ML] Force time shift integration test (elastic#111620) ESQL: Add tests for sort, where with unsupported type (elastic#111737) [ML] Force time shift documentation (elastic#111668) Fix remote cluster credential secure settings reload (elastic#111535) ESQL: Fix for overzealous validation in case of invalid mapped fields (elastic#111475) Pass allow security manager flag in gradle test policy setup plugin (elastic#111725) Rename streamContent/Separator to bulkContent/Separator (elastic#111716) Mute org.elasticsearch.tdigest.ComparisonTests testSparseGaussianDistribution elastic#111721 Remove 8.14 from branches.json Only emit product origin in deprecation log if present (elastic#111683) Forward port release notes for v8.15.0 (elastic#111714) [ES|QL] Combine Disjunctive CIDRMatch (elastic#111501) ESQL: Remove qualifier from attrs (elastic#110581) Force using the last centroid during merging (elastic#111644) ... # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceNamedWriteablesProvider.java
* support CIDRMatch in CombineDisjunctions
* support CIDRMatch in CombineDisjunctions
Resolves #105143
This PR leverages
CombineDisjunctionsToInas it is very similar with combining multipleEqualss orIns intoIn, and efficient if do it in one pass.