SQL: Enhance checks for inexact fields#39427
Conversation
For functions: move checks for `text` fields without underlying `keyword` fields or with many of them (ambiguity) to the type resolution stage. For Order By/Group By: move checks to the `Verifier` to catch early before `QueryTranslator` or execution. Fixes: elastic#38501
|
Pinging @elastic/es-search |
|
Also fixes: #35203 |
costin
left a comment
There was a problem hiding this comment.
Thanks for looking into this. Left a few comments, mainly around naming.
| import static org.elasticsearch.xpack.sql.expression.Expressions.name; | ||
| import static org.elasticsearch.xpack.sql.type.DataType.BOOLEAN; | ||
|
|
||
| public final class TypeResolutionUtils { |
There was a problem hiding this comment.
I would prefer using the plural instead of Utils - TypeResolutions. Utils has been used where the plural already existed in the rest of the codebase to avoid name clashes.
There was a problem hiding this comment.
The plural just sounded to me a bit weird, but I'll change.
|
|
||
| private TypeResolutionUtils() {} | ||
|
|
||
| public static TypeResolution typeMustBeBoolean(Expression e, String operationName, ParamOrdinal paramOrd) { |
There was a problem hiding this comment.
Additionally, how about simplifying the method names from "typeMustBeNumeric" to "isNumeric" similar to the assertion in JUnit? It also add consistency ("expressionMustBe", "typeMustBe...").
| return TypeResolution.TYPE_RESOLVED; | ||
| } | ||
|
|
||
| public static TypeResolution expressionMustBeTableColumn(Expression e, String operationName, ParamOrdinal paramOrd) { |
There was a problem hiding this comment.
constant/table column is a bit misleading - I would just use isFoldable/notFoldable since it's a terminology already used through-out the code.
|
|
||
| @Override | ||
| public Tuple<Boolean, String> hasExact() { | ||
| return PROCESS_EXACT_FIELD.apply(findExact()); |
There was a problem hiding this comment.
What's the reason behind using the closure?
There was a problem hiding this comment.
The findExact() returns also the exact field (used by getExact()) so I wanted to convert the field == null ? false : true and chose a Function to be somehow more elegant.
There was a problem hiding this comment.
I got that but why not use the code directly? The closure is not passed anywhere...
Tuple<EsField, String> tuple = findExact();
return tuple.v1() == null ? new Exact(false, tuple.v2()) : new Exact(true, null)| assertEquals("value", tq.value()); | ||
| } | ||
|
|
||
| public void testTermEqualityAnalyzerAmbiguous() { |
There was a problem hiding this comment.
Yes, here: https://github.com/elastic/elasticsearch/pull/39427/files#diff-334bd2f8c33b60789817181fcef4ca52R331 with 2 methods now.
| * {False, String} if this field doesn't have an underlying exact field, the v2 of the tuple is a string containing | ||
| * an error message explaining why that is. | ||
| */ | ||
| public Tuple<Boolean, String> hasExact() { |
There was a problem hiding this comment.
I think having both isExact and hasExact is confusing - is the former still used anywhere? I opt for refactoring isExact to return the tuple instead (which can be promoted to a dedicated inner class such (Exact) to make its use easier and its intent clearer).
There was a problem hiding this comment.
I agree with the confusing isExact but it's used externally here: https://github.com/elastic/elasticsearch/pull/39427/files#diff-1059b5208ef7d10a0fdc89b5dad914edR90 and has slightly different semantics than hasExact.
hasExact returns <TRUE, null> if it's exact, or has an exact underlying field, so we don't know which is the case.
But if we introduce a class Exact to replace the the Tuple we can include this info there and remove the confusing isExact() method. Thx!
Whatever it's easier - I think the PR itself is not complicated but is made big by the amount of files that it touches. I'm fine with backporting the whole PR. |
|
@costin thanks for the comments! Pushed a commit to address them. |
astefan
left a comment
There was a problem hiding this comment.
LGTM. Nice cleanup. Left few minor comments.
| } | ||
|
|
||
| public void testGroupByOnInexact() { | ||
| assertEquals("1:36: Grouping field of data type [text] for grouping; " + |
There was a problem hiding this comment.
I am not sure I understand the first sentence in this error message, it has no predicate. Can you try a more descriptive message?
There was a problem hiding this comment.
ah, sorry I mixed something up there, will fix.
|
|
||
| @Override | ||
| protected TypeResolution resolveType() { | ||
| return typeMustBeExact(child, "ORDER BY cannot be applied to field of data type[{}]: {}"); |
There was a problem hiding this comment.
White space between type and [{}]: ...to field of data type [{}]: {}.
| } | ||
|
|
||
| public void testInvalidTypeForStringFunction_WithOneArgNumeric() { | ||
| assertEquals("1:8: argument of [CHAR('foo')] must be [integer], found value ['foo'] type [keyword]", |
There was a problem hiding this comment.
How does the error message look like for something like SELECT ASCII(CHAR('foo'))? (nested functions)
There was a problem hiding this comment.
Added a test: cda66ca#diff-334bd2f8c33b60789817181fcef4ca52R457
Good thinking! But seems fine since we use sourceText() on the offending function and it's not showing the whole expression that can be complex and result in confusion for the user.
|
Also, shouldn't this one go to 7.0.x as well? |
|
@astefan yes, it will actually be backported all the way to |
MIN/MAX on strings are supported and are implemented with TopAggs FIRST/LAST respectively, but they cannot operate on `text` fields without underlying `keyword` fields => inexact. Follows: elastic#39427
MIN/MAX on strings are supported and are implemented with TopAggs FIRST/LAST respectively, but they cannot operate on `text` fields without underlying `keyword` fields => inexact. Follows: #39427
MIN/MAX on strings are supported and are implemented with TopAggs FIRST/LAST respectively, but they cannot operate on `text` fields without underlying `keyword` fields => inexact. Follows: #39427
MIN/MAX on strings are supported and are implemented with TopAggs FIRST/LAST respectively, but they cannot operate on `text` fields without underlying `keyword` fields => inexact. Follows: #39427
MIN/MAX on strings are supported and are implemented with TopAggs FIRST/LAST respectively, but they cannot operate on `text` fields without underlying `keyword` fields => inexact. Follows: #39427
For functions: move checks for
textfields without underlyingkeywordfields or with many of them (ambiguity) to the type resolution stage.
For Order By/Group By: move checks to the
Verifierto catch earlybefore
QueryTranslatoror execution.Fixes: #38501