SQL: Implement null handling for IN(v1, v2, ...)#34750
SQL: Implement null handling for IN(v1, v2, ...)#34750matriv merged 3 commits intoelastic:masterfrom
IN(v1, v2, ...)#34750Conversation
Implemented null handling for both the value tested but also for values inside the list of values tested against. The null handling is implemented for local processors, painless scripts and Lucene Terms queries making it available for `IN` expressions occuring in `SELECT`, `WHERE` and `HAVING` clauses. Closes: elastic#34582
|
Pinging @elastic/es-search-aggs |
costin
left a comment
There was a problem hiding this comment.
I think the NULL handling inside the values is incorrect since it should signify missing value.
However this can be a separate PR.
| if (compResult == null) { | ||
| result = null; | ||
| } | ||
| if (compResult != null && compResult) { |
There was a problem hiding this comment.
compResult is not null already so no need to check it;
if (compResult == null) {..} else if(compResult) { ..}
| return valuesOf(list, to, false); | ||
| } | ||
|
|
||
| public static <T> List<T> valuesOf(List<Expression> list, DataType to, boolean removeNulls) { |
There was a problem hiding this comment.
I'm not sure why nulls need to be removed - if the expression in the list is NULL, it should be return accordingly.
There was a problem hiding this comment.
Please check my comment here: #34750 (comment)
| Boolean compResult = Comparisons.eq(foldedLeftValue, rightValue.fold()); | ||
| if (compResult != null && compResult) { | ||
| if (compResult == null) { | ||
| // if (value().dataType() == DataType.NULL || rightValue.dataType() == DataType.NULL) { |
There was a problem hiding this comment.
This comments are confusing (not sure if this is the final version or not).
There was a problem hiding this comment.
Sorry about that, some mixup with git stash.
| super(location); | ||
| this.term = term; | ||
| this.values = new LinkedHashSet<>(Foldables.valuesOf(values, values.get(0).dataType())); | ||
| this.values = new LinkedHashSet<>(Foldables.valuesOf(values, values.get(0).dataType(), true)); |
There was a problem hiding this comment.
To avoid having the overloaded method, it might be more convenient to use Collections.removeIf(p -> p == null) since this is something local to the TermsQuery. On the other hand, it might also imply that when the value is missing (is null), there should be a match.
There was a problem hiding this comment.
I did that only to avoid a second iteration on the list (to remove the nulls), but I can change it. what do you think? I guess it can make a difference only if the list is really long.
There was a problem hiding this comment.
Yup. I wouldn't worry about it. I'm with you regarding avoiding the iteration but again, the Foldable method looks obscure with the null exclusion.
In other words, it's not Foldable that should handle null - I'm fain with handling it inside Terms I wonder though if that has any implications for inline IN.
| } | ||
|
|
||
| public void testTranslateInExpression_WhereClauseAndNullHAndling() throws IOException { | ||
| LogicalPlan p = plan("SELECT * FROM test WHERE keyword IN ('foo', null, 'lala', null, 'foo', concat('la', 'la'))"); |
There was a problem hiding this comment.
I think null means the value is missing. That is the IN should become a bool query between missing and terms.
There was a problem hiding this comment.
I tried to copy the behaviour to Postgres:
test=# select * from t1 where a in (null);
a
---
(0 rows)
test=# select * from t1 where a is null;
a
---
(4 rows)
WHERE col IN (null) is different than WHERE col is NULL, the first one evaluates to NULL which in turn becomes false for WHERE and HAVING clauses.
MySQL behaves the same way too.
There was a problem hiding this comment.
Makes sense. IN means = and that fails against null (need to use IS (NOT) NULL).
astefan
left a comment
There was a problem hiding this comment.
Looking good. Left just one comment.
| return true; | ||
| } else if (isString() && other.isString()) { | ||
| return true; | ||
| } else if (isNumeric() && other.isNumeric()) { |
There was a problem hiding this comment.
All these conditions that return true wouldn't look better if there is one if () only?
There was a problem hiding this comment.
I did it only to be more readable, can combine them in one if.
There was a problem hiding this comment.
Maybe use one if with one condition evaluation per line - best of both worlds.
Implemented null handling for both the value tested but also for values inside the list of values tested against. The null handling is implemented for local processors, painless scripts and Lucene Terms queries making it available for `IN` expressions occuring in `SELECT`, `WHERE` and `HAVING` clauses. Closes: #34582
|
Backported to |
Handle the case when `null` is the only value in the list so that it's translated to a `MatchNoDocsQuery`. Followup to: elastic#34750
Handle the case when `null` is the only value in the list so that it's translated to a `MatchNoDocsQuery`. Followup to: #34750
Handle the case when `null` is the only value in the list so that it's translated to a `MatchNoDocsQuery`. Followup to: #34750
Handle the case when `null` is the only value in the list so that it's translated to a `MatchNoDocsQuery`. Followup to: #34750
Implemented null handling for both the value tested but also for values inside the list of values tested against. The null handling is implemented for local processors, painless scripts and Lucene Terms queries making it available for `IN` expressions occuring in `SELECT`, `WHERE` and `HAVING` clauses. Closes: #34582
Handle the case when `null` is the only value in the list so that it's translated to a `MatchNoDocsQuery`. Followup to: #34750
Replace standard `||` and `==` painless operators with `or` and `eq` null safe alternatives from `InternalSqlScriptUtils`. Follow up to elastic#34750
Replace standard `||` and `==` painless operators with new `in` method introduced in `InternalSqlScriptUtils`. This allows the list of values to become a script variable which is replaced each time with the list of values provided by the user. Move In to the same package as InPipe & InProcessor Follow up to #34750 Co-authored-by: Costin Leau <costin.leau@gmail.com>
Replace standard `||` and `==` painless operators with new `in` method introduced in `InternalSqlScriptUtils`. This allows the list of values to become a script variable which is replaced each time with the list of values provided by the user. Move In to the same package as InPipe & InProcessor Follow up to #34750 Co-authored-by: Costin Leau <costin.leau@gmail.com>
Replace standard `||` and `==` painless operators with new `in` method introduced in `InternalSqlScriptUtils`. This allows the list of values to become a script variable which is replaced each time with the list of values provided by the user. Move In to the same package as InPipe & InProcessor Follow up to #34750 Co-authored-by: Costin Leau <costin.leau@gmail.com>
Implemented null handling for both the value tested but also for
values inside the list of values tested against.
The null handling is implemented for local processors, painless scripts
and Lucene Terms queries making it available for
INexpressions occuringin
SELECT,WHEREandHAVINGclauses.Closes: #34582