SQL: Introduce support for NULL values by costin · Pull Request #34573 · elastic/elasticsearch

costin · 2018-10-17T20:57:05Z

Make SQL aware of missing and/or unmapped fields treating them as NULL
Make all functions and operators null-safe aware, including when used
in filtering or sorting contexts
Add missing and null-safe doc value extractor
Modify dataset to have null fields spread around (in groups of 10)
Enforce missing last and unmapped_type inside sorting
Consolidate Predicate templating and declaration
Add support for Like/RLike in scripting
Introduce early schema declaration for CSV spec tests: to keep the doc
snippets in place, introduce schema:: prefix to declare the CSV schema
upfront.

Fix #32079

Make SQL aware of missing and/or unmapped fields treating them as NULL Make _all_ functions and operators null-safe aware, including when used in filtering or sorting contexts Add missing and null-safe doc value extractor Modify dataset to have null fields spread around (in groups of 10) Enforce missing last and unmapped_type inside sorting Consolidate Predicate templating and declaration Add support for Like/RLike in scripting Introduce early schema declaration for CSV spec tests: to keep the doc snippets in place, introduce schema:: prefix to declare the CSV schema upfront. Fix elastic#32079

elasticmachine · 2018-10-17T20:57:09Z

Pinging @elastic/es-search-aggs

astefan

Left few comments.

astefan · 2018-10-18T11:25:54Z

...sql/src/main/java/org/elasticsearch/xpack/sql/expression/predicate/regex/RegexProcessor.java

+        }
+
+        public static Boolean match(Object value, Object pattern) {
+            if (value == null && pattern == null) {


Shouldn't here be a || instead of &&? If at least one of those values are null one of the next statements will throw a NPE.

It should. Fixed.

astefan · 2018-10-18T11:27:58Z

x-pack/plugin/sql/src/main/antlr/SqlBase.g4


 orderBy
-    : expression ordering=(ASC | DESC)?
+    : expression ordering=(ASC | DESC)? (NULLS nullOrdering=(FIRST | LAST))?


If null ordering can only be LAST, FIRST is here present for a future improvement?

I've generalized this in the latest commit. Regardless the reason for it is to properly support the correct grammar if functionality-wise, this is not implemented.

astefan · 2018-10-18T11:42:58Z

x-pack/qa/sql/src/main/resources/employees.csv

@@ -8,46 +8,46 @@ birth_date,emp_no,first_name,gender,hire_date,languages,last_name,salary
 1957-05-23T00:00:00Z,10007,Tzvetan,F,1989-02-10T00:00:00Z,4,Zielinski,74572


The test data has no document where more than one field is null and things like CONCAT(first_name,last_name) where both fields are NULL cannot be tested.

I'd rather test such functions individually simply because the test data is not that large and the nulls already limit its usefulness.

astefan · 2018-10-18T12:02:11Z

...org/elasticsearch/xpack/sql/expression/function/scalar/whitelist/InternalSqlScriptUtils.java

+        throw new SqlIllegalArgumentException("Invalid date encountered [{}]", dateTime);
+    }
+
+    public static String dayName(Long millis, String tzId) {


Testing select day_name(birth_date) from test_emp group by day_name(birth_date); throws a NPE. Using the _translate API the error is summed up as

"reason": { "type": "script_exception", "reason": "runtime error", "script_stack": [ "InternalSqlScriptUtils.dayName(InternalSqlScriptUtils.docValue(doc,params.v0).millis, params.v1)", " ^---- HERE" ], "script": "InternalSqlScriptUtils.dayName(InternalSqlScriptUtils.docValue(doc,params.v0).millis, params.v1)", "lang": "painless", "caused_by": { "type": "null_pointer_exception", "reason": null } }

Looks like the current test suite didn't catch this. I've added some tests for day_name, month_name and quarter as they all suffer from the same problem.

astefan · 2018-10-18T12:08:22Z

...ugin/sql/src/main/java/org/elasticsearch/xpack/sql/expression/predicate/BinaryPredicate.java

+        BinaryPredicate<?, ?, ?, ?> other = (BinaryPredicate<?, ?, ?, ?>) obj;

-        return Objects.equals(symbol, other.symbol)
+        return Objects.equals(other.symbol(), other.symbol())


Typo? These will always be equal.

astefan · 2018-10-18T12:31:12Z

...k/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/expression/predicate/regex/RLike.java

+public class RLike extends RegexMatch {

    public RLike(Location location, Expression left, Literal right) {
-        super(location, left, right, "RLIKE");


It seems we don't have any RLIKE tests, but can be added in a separate issue.

matriv

Really nice work! Left some minor comments.

Question: why did you choose to remove the table with null values and merge it with the test_emp?
Wouldn't be useful to keep separate and have null-value specific tests?

matriv · 2018-10-18T13:07:25Z

...main/java/org/elasticsearch/xpack/sql/expression/function/scalar/string/StringProcessor.java


        StringOperation(StringFunction<Object> apply) {
-            this.apply = l -> l == null ? null : apply.apply(l);
+            this.apply = l -> l == null ? null : apply.apply((l));


minor: remove extra parentheses

matriv · 2018-10-18T13:08:29Z

...org/elasticsearch/xpack/sql/expression/function/scalar/whitelist/InternalSqlScriptUtils.java

+        if (millis == null || tzId == null) {
+            return null;
+        }
+


minor: remove new line

matriv · 2018-10-18T13:08:35Z

...org/elasticsearch/xpack/sql/expression/function/scalar/whitelist/InternalSqlScriptUtils.java

+        if (millis == null || tzId == null) {
+            return null;
+        }
+


minor: remove new line

matriv · 2018-10-18T13:16:07Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/expression/gen/script/Scripts.java

+    private static final Map<Pattern, String> FORMATTING_PATTERNS;
+
+    static {
+        Map<String, String> patterns = new LinkedHashMap<>();


Suggestion: You could do it in one step without the intermediate patterns map.

Maybe you can use the ImmutableMap.of() or

Collections.unmodifiableMap(Stream.of( new SimpleEntry<>(Pattens.compile("doc[{}].value", Pattern.LITERAL), "{sql}.docValue(doc,{})"), ... .collect(Collectors.toMap((e) -> e.getKey(), (e) -> e.getValue()))); ``` and avoid the `static` block.

We're trying to stay away from Guava especially for simple things.
The Stream suggestion is good however after adding it I think it decreases readability (there's a lot of Pattern.compile repetition and I find the map collector complicated).
The static block might be a bit more verbose but I find it clearer.

No worries, was just a suggestion.

Moved the pattern compile in the collector and got rid of the static block in the latest commit.

I need to merge master after all.

matriv · 2018-10-18T13:23:29Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/planner/QueryTranslator.java

    }

-    static class BinaryLogic extends ExpressionTranslator<BinaryPredicate> {
+    static class BinaryLogic extends ExpressionTranslator<org.elasticsearch.xpack.sql.expression.predicate.logical.BinaryLogic> {


Maybe I'm missing something but why do you need fully qName here?

Name clash on BinaryLogic.

matriv · 2018-10-18T13:23:36Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/planner/QueryTranslator.java


        @Override
-        protected QueryTranslation asQuery(BinaryPredicate e, boolean onAggs) {
+        protected QueryTranslation asQuery(org.elasticsearch.xpack.sql.expression.predicate.logical.BinaryLogic e, boolean onAggs) {


There's a BinaryLogic class declared inside QueryTranslator (the one above).

matriv · 2018-10-18T13:26:11Z

x-pack/qa/sql/src/main/java/org/elasticsearch/xpack/qa/sql/jdbc/JdbcAssert.java

                    if (expectedObject == null || actualObject == null) {
-                        assertEquals(msg, expectedObject, actualObject);
+                        // hack for JDBC CSV nulls
+                        if ("null".equals(expectedObject)) {


maybe check with use of lower or uppercase to allow null and NULL

astefan · 2018-10-18T14:32:56Z

@matriv the null values table was added in the past specifically for testing the empty bucket being created for null values as part of #32831. The intention was to be removed when NULLs were in place and integrate the NULL values documents in the main table.

costin · 2018-10-18T15:04:57Z

@matriv to add to what @astefan said, if null would be a separate table we'd have to either run the tests across multiple indices and take into account duplicated data or come with a completely different dataset/data.
Plus it also complicated debugging - what table am I looking into.
Turns out that merging the two together makes things simple from this perspective and increases the randomness of the data.

matriv

LGTM

astefan

LGTM

costin · 2018-10-18T21:50:01Z

retest this please

Push REST/YAML fix Update docs

…2079

Make SQL aware of missing and/or unmapped fields treating them as NULL Make _all_ functions and operators null-safe aware, including when used in filtering or sorting contexts Add missing and null-safe doc value extractor Modify dataset to have null fields spread around (in groups of 10) Enforce missing last and unmapped_type inside sorting Consolidate Predicate templating and declaration Add support for Like/RLike in scripting Generalize NULLS LAST/FIRST Introduce early schema declaration for CSV spec tests: to keep the doc snippets in place (introduce schema:: prefix for declaration) upfront. Fix elastic#32079 (cherry picked from commit 52104aa)

Make SQL aware of missing and/or unmapped fields treating them as NULL Make _all_ functions and operators null-safe aware, including when used in filtering or sorting contexts Add missing and null-safe doc value extractor Modify dataset to have null fields spread around (in groups of 10) Enforce missing last and unmapped_type inside sorting Consolidate Predicate templating and declaration Add support for Like/RLike in scripting Generalize NULLS LAST/FIRST Introduce early schema declaration for CSV spec tests: to keep the doc snippets in place (introduce schema:: prefix for declaration) upfront. Fix #32079

costin added >enhancement v7.0.0 :Analytics/SQL SQL querying v6.5.0 labels Oct 17, 2018

costin requested review from astefan and matriv October 17, 2018 20:57

This was referenced Oct 18, 2018

SQL: Implement IN(value1, value2, ...) expression. #34581

Merged

SQL: Handle nulls for field IN(v1, v2, ..) expressions #34582

Closed

astefan requested changes Oct 18, 2018

View reviewed changes

matriv reviewed Oct 18, 2018

View reviewed changes

Generalize NULLS LAST/FIRST

1111217

Address feedback

04bd07a

matriv approved these changes Oct 18, 2018

View reviewed changes

astefan approved these changes Oct 18, 2018

View reviewed changes

costin force-pushed the fix-for-32079 branch from 7164c7b to 11783a2 Compare October 19, 2018 07:50

costin added 4 commits October 19, 2018 13:32

Minor polish

b05c145

Push REST/YAML fix Update docs

Merge remote-tracking branch 'remotes/upstream/master' into fix-for-3…

29e92e7

…2079

Merge remote-tracking branch 'remotes/upstream/master' into fix-for-3…

59428ea

…2079

Merge remote-tracking branch 'remotes/upstream/master' into fix-for-3…

83e527a

…2079

costin force-pushed the fix-for-32079 branch from 11783a2 to 83e527a Compare October 19, 2018 10:33

costin removed the v6.5.0 label Oct 19, 2018

costin merged commit 52104aa into elastic:master Oct 19, 2018

costin deleted the fix-for-32079 branch October 19, 2018 14:02

matriv mentioned this pull request Oct 23, 2018

SQL: CASTing from two columns throws error #34542

Closed

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

		@@ -8,46 +8,46 @@ birth_date,emp_no,first_name,gender,hire_date,languages,last_name,salary
		1957-05-23T00:00:00Z,10007,Tzvetan,F,1989-02-10T00:00:00Z,4,Zielinski,74572

Conversation

costin commented Oct 17, 2018

Uh oh!

elasticmachine commented Oct 17, 2018

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matriv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

astefan commented Oct 18, 2018

Uh oh!

costin commented Oct 18, 2018

Uh oh!

matriv left a comment

Choose a reason for hiding this comment

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

costin commented Oct 18, 2018

Uh oh!

Reviewers