ESQL: add =~ operator (case insensitive equality)#103656
ESQL: add =~ operator (case insensitive equality)#103656luigidellaquila merged 28 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/es-ql (Team:QL) |
|
Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL) |
costin
left a comment
There was a problem hiding this comment.
No need to touch the QL side - see my comments.
Also the operator need to work on multi-value fields - please add unwrapping and tests.
|
|
||
| evalEquals#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13] | ||
| from employees | where emp_no == 10001 | ||
| | eval a = first_name =~ "georgi", b = first_name == "georgi", c = first_name =~ "GEORGI", d = first_name =~ "Geor" |
There was a problem hiding this comment.
try "GeoRgI" or something similar just in cae.
| TRUE : 'true'; | ||
|
|
||
| EQ : '=='; | ||
| EQ_IGNORE_CASE : '=~'; |
There was a problem hiding this comment.
I propose we use SEQ for "string equality" (just like in EQL) or IEQ for "Insensitive Equality".
Same for the backing class.
|
|
||
| import static org.elasticsearch.xpack.ql.expression.TypeResolutions.ParamOrdinal.DEFAULT; | ||
|
|
||
| public class EqualsIgnoreCase extends BinaryComparison { |
There was a problem hiding this comment.
StringEquals or InsensitiveEquals
| return i == null ? null : i.intValue() == 0; | ||
| } | ||
|
|
||
| public static Boolean eqIgnoreCase(Object l, Object r) { |
There was a problem hiding this comment.
Who's using this method?
If needed see StringComparisons inside EQL
There was a problem hiding this comment.
Removed (together with all the other changes in QL)
| * 2.0; you may not use this file except in compliance with the Elastic License | ||
| * 2.0. | ||
| */ | ||
| package org.elasticsearch.xpack.ql.expression.predicate.operator.comparison; |
There was a problem hiding this comment.
This should be in the ESQL package.
| return new RangeQuery(source, name, null, false, value, true, format, zoneId); | ||
| } | ||
| if (bc instanceof Equals || bc instanceof NullEquals || bc instanceof NotEquals) { | ||
| if (bc instanceof Equals || bc instanceof NullEquals || bc instanceof NotEquals || bc instanceof EqualsIgnoreCase) { |
There was a problem hiding this comment.
No need to touch QL - see InsensitiveBinaryComparisons inside EQL on how to extend the method (add a dedicated TranslatorHandler).
|
Thanks for the feedback @costin, I pushed some changes and moved all the logic in ESQL module (apart from one small fix in QL).
I think this aspect needs some clarifications, let's move the discussion off-line |
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
astefan
left a comment
There was a problem hiding this comment.
What's the expected behavior when doing something like first_name =~ last_name?
The original issue this PR covers says
Introduce =~ operator for performing case insensitive equality between a string field and a string literal
The field can be both single and multi value. The literal has to be single value.
Should I see an error, or warning or did the requirements change in the meantime and the original issue and/or the PR description needs an update?
On a similar note, if I do where last_name =~ 123 I get no results and no relevant warning. What is the expected behavior in this case?
Another similar behavior is with where last_name =~ languages where I get a CCE:
class org.elasticsearch.compute.data.IntArrayBlock cannot be cast to class org.elasticsearch.compute.data.BytesRefBlock
but if I do where last_name == languages I am getting a more meaningful error message:
"reason": "Found 1 problem\nline 1:24: first argument of [last_name == languages] is [keyword] so second argument must also be [keyword] but was [integer]",
"stack_trace": "org.elasticsearch.xpack.esql.analysis.VerificationException: Found 1 problem\nline 1:24: first argument of [last_name == languages] is [keyword] so second argument must also be [keyword] but was [integer]
|
This test also fails whereas |
I don't have a strong opinion on this. For sure having a string literal there will reduce validation risks, solving possible problems at parsing time.
I think it's a bug (lack of validation), it should throw an exception as == does.
same as above. I'll check and fix it.
this is interesting, I'll investigate Thanks! |
If this is supported on purpose, then there should be tests covering it. |
|
Thanks for you feedback @astefan |
|
@elasticmachine update branch |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
costin
left a comment
There was a problem hiding this comment.
Thanks Luigi - this looks pretty complete.
I've added a comment regarding testing and whether field attributes are allowed or not both sides.
There was a problem hiding this comment.
It's not clear from the tests if the right hand side allows only literals (the original requirement), folded expressions or generic expressions (which you added).
I think it's the first variant but I don't see any tests validating this.
So please add more test to either validate that literals/folded expressions are required (and fields are not allowed) or vice-versa - queries that have fields on both sides and more over expressions:
where concat(field, "constant") =~ concat(field, concat("con", "stant)) etc...
There was a problem hiding this comment.
Thanks @costin
Yes, the implementation supports any kind of expression, both on the left and on the right.
I added a few more tests for this.
|
|
||
|
|
||
| expressionsLeftRight#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13] | ||
| from employees | where substring(first_name, 1, 2) =~ substring(last_name, -2) | keep emp_no, first_name, last_name | sort emp_no; |
There was a problem hiding this comment.
I've thought about this some more and we need to conclude how to address the missing requirements (multi-value and wildcard matching) from
#103599 before enabling this functionality.
If =~ is asymmetric like EQL : than comparing two fields can't happen. in ESQL we might do thing differently but until we reach a conclusion, I prefer we don't ship these features to begin with; it's better to enable them later than have them disabled.
In other words, please keep the comparison only between a literal/folded expression and a field but not between two fields.
There was a problem hiding this comment.
in ESQL we might do thing differently but until we reach a conclusion, I prefer we don't ship these features to begin with
👍 I think it's a wise decision, I'll disable field vs field comparison at validation time (I'll allow only foldable expressions on the right) until we have a final decision on all the aspects.
|
@elasticmachine update branch |
costin
left a comment
There was a problem hiding this comment.
Thanks for addressing the feedback - let's get this in!
## Summary Keep in sync with new ES|QL builtin function addition: elastic/elasticsearch#103656 Do not merge until the related ER PR is still in review. ### Checklist - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
## Summary Keep in sync with new ES|QL builtin function addition: elastic/elasticsearch#103656 Do not merge until the related ER PR is still in review. ### Checklist - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
## Summary Keep in sync with new ES|QL builtin function addition: elastic/elasticsearch#103656 Do not merge until the related ER PR is still in review. ### Checklist - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
Add an operator to perform case insensitive equality comparison of strings.
The operator symbol is
=~, eg.The operator acts as a normal==for all non-string typesThe operator is defined only for string values
For multi-values the behavior is consistent with
==, ie.multi_value =~ "string"returnsnullRelated to #103599