SQL: Introduce support for NULL values#34573
Conversation
Make SQL aware of missing and/or unmapped fields treating them as NULL Make _all_ functions and operators null-safe aware, including when used in filtering or sorting contexts Add missing and null-safe doc value extractor Modify dataset to have null fields spread around (in groups of 10) Enforce missing last and unmapped_type inside sorting Consolidate Predicate templating and declaration Add support for Like/RLike in scripting Introduce early schema declaration for CSV spec tests: to keep the doc snippets in place, introduce schema:: prefix to declare the CSV schema upfront. Fix elastic#32079
|
Pinging @elastic/es-search-aggs |
| } | ||
|
|
||
| public static Boolean match(Object value, Object pattern) { | ||
| if (value == null && pattern == null) { |
There was a problem hiding this comment.
Shouldn't here be a || instead of &&? If at least one of those values are null one of the next statements will throw a NPE.
|
|
||
| orderBy | ||
| : expression ordering=(ASC | DESC)? | ||
| : expression ordering=(ASC | DESC)? (NULLS nullOrdering=(FIRST | LAST))? |
There was a problem hiding this comment.
If null ordering can only be LAST, FIRST is here present for a future improvement?
There was a problem hiding this comment.
I've generalized this in the latest commit. Regardless the reason for it is to properly support the correct grammar if functionality-wise, this is not implemented.
There was a problem hiding this comment.
I've generalized this in the latest commit. Regardless the reason for it is to properly support the correct grammar if functionality-wise, this is not implemented.
| @@ -8,46 +8,46 @@ birth_date,emp_no,first_name,gender,hire_date,languages,last_name,salary | |||
| 1957-05-23T00:00:00Z,10007,Tzvetan,F,1989-02-10T00:00:00Z,4,Zielinski,74572 | |||
There was a problem hiding this comment.
The test data has no document where more than one field is null and things like CONCAT(first_name,last_name) where both fields are NULL cannot be tested.
There was a problem hiding this comment.
I'd rather test such functions individually simply because the test data is not that large and the nulls already limit its usefulness.
| throw new SqlIllegalArgumentException("Invalid date encountered [{}]", dateTime); | ||
| } | ||
|
|
||
| public static String dayName(Long millis, String tzId) { |
There was a problem hiding this comment.
Testing select day_name(birth_date) from test_emp group by day_name(birth_date); throws a NPE. Using the _translate API the error is summed up as
"reason": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"InternalSqlScriptUtils.dayName(InternalSqlScriptUtils.docValue(doc,params.v0).millis, params.v1)",
" ^---- HERE"
],
"script": "InternalSqlScriptUtils.dayName(InternalSqlScriptUtils.docValue(doc,params.v0).millis, params.v1)",
"lang": "painless",
"caused_by": {
"type": "null_pointer_exception",
"reason": null
}
}
There was a problem hiding this comment.
Looks like the current test suite didn't catch this. I've added some tests for day_name, month_name and quarter as they all suffer from the same problem.
| BinaryPredicate<?, ?, ?, ?> other = (BinaryPredicate<?, ?, ?, ?>) obj; | ||
|
|
||
| return Objects.equals(symbol, other.symbol) | ||
| return Objects.equals(other.symbol(), other.symbol()) |
There was a problem hiding this comment.
Typo? These will always be equal.
| public class RLike extends RegexMatch { | ||
|
|
||
| public RLike(Location location, Expression left, Literal right) { | ||
| super(location, left, right, "RLIKE"); |
There was a problem hiding this comment.
It seems we don't have any RLIKE tests, but can be added in a separate issue.
matriv
left a comment
There was a problem hiding this comment.
Really nice work! Left some minor comments.
Question: why did you choose to remove the table with null values and merge it with the test_emp?
Wouldn't be useful to keep separate and have null-value specific tests?
|
|
||
| StringOperation(StringFunction<Object> apply) { | ||
| this.apply = l -> l == null ? null : apply.apply(l); | ||
| this.apply = l -> l == null ? null : apply.apply((l)); |
There was a problem hiding this comment.
minor: remove extra parentheses
| if (millis == null || tzId == null) { | ||
| return null; | ||
| } | ||
|
|
| if (millis == null || tzId == null) { | ||
| return null; | ||
| } | ||
|
|
| private static final Map<Pattern, String> FORMATTING_PATTERNS; | ||
|
|
||
| static { | ||
| Map<String, String> patterns = new LinkedHashMap<>(); |
There was a problem hiding this comment.
Suggestion: You could do it in one step without the intermediate patterns map.
Maybe you can use the ImmutableMap.of() or
Collections.unmodifiableMap(Stream.of(
new SimpleEntry<>(Pattens.compile("doc[{}].value", Pattern.LITERAL), "{sql}.docValue(doc,{})"),
...
.collect(Collectors.toMap((e) -> e.getKey(), (e) -> e.getValue())));
``` and avoid the `static` block.
There was a problem hiding this comment.
We're trying to stay away from Guava especially for simple things.
The Stream suggestion is good however after adding it I think it decreases readability (there's a lot of Pattern.compile repetition and I find the map collector complicated).
The static block might be a bit more verbose but I find it clearer.
There was a problem hiding this comment.
No worries, was just a suggestion.
There was a problem hiding this comment.
Moved the pattern compile in the collector and got rid of the static block in the latest commit.
I need to merge master after all.
| } | ||
|
|
||
| static class BinaryLogic extends ExpressionTranslator<BinaryPredicate> { | ||
| static class BinaryLogic extends ExpressionTranslator<org.elasticsearch.xpack.sql.expression.predicate.logical.BinaryLogic> { |
There was a problem hiding this comment.
Maybe I'm missing something but why do you need fully qName here?
There was a problem hiding this comment.
Name clash on BinaryLogic.
|
|
||
| @Override | ||
| protected QueryTranslation asQuery(BinaryPredicate e, boolean onAggs) { | ||
| protected QueryTranslation asQuery(org.elasticsearch.xpack.sql.expression.predicate.logical.BinaryLogic e, boolean onAggs) { |
There was a problem hiding this comment.
There's a BinaryLogic class declared inside QueryTranslator (the one above).
| if (expectedObject == null || actualObject == null) { | ||
| assertEquals(msg, expectedObject, actualObject); | ||
| // hack for JDBC CSV nulls | ||
| if ("null".equals(expectedObject)) { |
There was a problem hiding this comment.
maybe check with use of lower or uppercase to allow null and NULL
|
@matriv to add to what @astefan said, if null would be a separate table we'd have to either run the tests across multiple indices and take into account duplicated data or come with a completely different dataset/data. |
|
retest this please |
Push REST/YAML fix Update docs
Make SQL aware of missing and/or unmapped fields treating them as NULL Make _all_ functions and operators null-safe aware, including when used in filtering or sorting contexts Add missing and null-safe doc value extractor Modify dataset to have null fields spread around (in groups of 10) Enforce missing last and unmapped_type inside sorting Consolidate Predicate templating and declaration Add support for Like/RLike in scripting Generalize NULLS LAST/FIRST Introduce early schema declaration for CSV spec tests: to keep the doc snippets in place (introduce schema:: prefix for declaration) upfront. Fix elastic#32079 (cherry picked from commit 52104aa)
Make SQL aware of missing and/or unmapped fields treating them as NULL Make _all_ functions and operators null-safe aware, including when used in filtering or sorting contexts Add missing and null-safe doc value extractor Modify dataset to have null fields spread around (in groups of 10) Enforce missing last and unmapped_type inside sorting Consolidate Predicate templating and declaration Add support for Like/RLike in scripting Generalize NULLS LAST/FIRST Introduce early schema declaration for CSV spec tests: to keep the doc snippets in place (introduce schema:: prefix for declaration) upfront. Fix #32079
Make SQL aware of missing and/or unmapped fields treating them as NULL
Make all functions and operators null-safe aware, including when used
in filtering or sorting contexts
Add missing and null-safe doc value extractor
Modify dataset to have null fields spread around (in groups of 10)
Enforce missing last and unmapped_type inside sorting
Consolidate Predicate templating and declaration
Add support for Like/RLike in scripting
Introduce early schema declaration for CSV spec tests: to keep the doc
snippets in place, introduce schema:: prefix to declare the CSV schema
upfront.
Fix #32079