semantic_text as string type in ES|QL - support for functions and operators#115243
semantic_text as string type in ES|QL - support for functions and operators#115243ioanatia merged 14 commits intoelastic:mainfrom
Conversation
| BiFunction<Integer, Stream<BytesRef>, Matcher<Object>> matcher | ||
| ) { | ||
| for (DataType type : new DataType[] { DataType.KEYWORD, DataType.TEXT, DataType.IP, DataType.VERSION }) { | ||
| for (DataType type : new DataType[] { DataType.KEYWORD, DataType.TEXT, DataType.IP, DataType.VERSION, DataType.SEMANTIC_TEXT }) { |
There was a problem hiding this comment.
For the mv functions, I only modified the tests that explicitly tested for keyword/text fields to also add tests for semantic_text. For most of the mv function tests I did not need to make any changes because I modified the bytesRef method in the parent abstract class.
mv_concat and mv_zip do not use bytesRef, but use the DataType.isString method to only test string types, so implicitly they tests semantic_text:
the rest of the mv functions that take string params use bytesRef and did not need modifications:
| } | ||
|
|
||
| private static String typeErrorString = | ||
| "boolean, cartesian_point, cartesian_shape, datetime, date_nanos, double, geo_point, geo_shape, integer, ip, keyword, long, text, " |
There was a problem hiding this comment.
for the comparison operators tests I did not need to make any other changes because they already use TestCase.stringCases:
The same applies for the rest of the comparison predicates:
TestCaseSupplier uses DataType.stringTypes() which contains semantic_text:
| "host3" | be excellent to each other | ||
| ; | ||
|
|
||
| case |
There was a problem hiding this comment.
tests for Conditional functions and expressions
| 3 | null | ||
| ; | ||
|
|
||
| convertToBool |
| 3 | null | ||
| ; | ||
|
|
||
| concat |
| live long and prosper | ||
| ; | ||
|
|
||
| mvAppend |
| 3 | null | ||
| ; | ||
|
|
||
| equalityWithConstant |
There was a problem hiding this comment.
tests for Binary operators - the ones that support string types
| 3 | null | ||
| ; | ||
|
|
||
| isNull |
|
@elasticmachine update branch |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
not-napoleon
left a comment
There was a problem hiding this comment.
Just a few thoughts from someone who's been working on data types for the past couple of months. Hopefully adding the new type hasn't been too difficult. Please feel free to ping me directly if you have questions.
| return TEXT; | ||
| } | ||
| if (left == SEMANTIC_TEXT || right == SEMANTIC_TEXT) { | ||
| return TEXT; |
There was a problem hiding this comment.
Note that if the two types are the same, we always return that type, so this is saying that only in case of mixed type operations, we expect SEMANTIC_TEXT to cast to TEXT. It seems a little odd to me that KEYWORD == SEMANTIC_TEXT would have a common type of TEXT, but maybe that's expected? If so, I think a comment would help here.
There was a problem hiding this comment.
I changed this so that the common type is KEYWORD in the end.
There was a problem hiding this comment.
I'm wondering if we shouldn't do the same for TEXT (ie. return KEYWORD). Not part of this PR, though, but worth considering.
...src/test/java/org/elasticsearch/xpack/esql/expression/function/AbstractFunctionTestCase.java
Outdated
Show resolved
Hide resolved
...test/java/org/elasticsearch/xpack/esql/expression/function/scalar/conditional/CaseTests.java
Outdated
Show resolved
Hide resolved
carlosdelest
left a comment
There was a problem hiding this comment.
This LGTM - great work on supporting this new data type! 💯
Left a question about the data type to use when combining with other types
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/EsqlDataTypeConverter.java
Show resolved
Hide resolved
| private static void add(List<TestCaseSupplier> suppliers, String name, int length, Supplier<String> valueSupplier) { | ||
| Map<Integer, List<List<DataType>>> permutations = new HashMap<Integer, List<List<DataType>>>(); | ||
| List<DataType> supportedDataTypes = List.of(DataType.KEYWORD, DataType.TEXT); | ||
| List<DataType> supportedDataTypes = List.of(DataType.KEYWORD, DataType.TEXT, DataType.SEMANTIC_TEXT); |
There was a problem hiding this comment.
| List<DataType> supportedDataTypes = List.of(DataType.KEYWORD, DataType.TEXT, DataType.SEMANTIC_TEXT); | |
| List<DataType> supportedDataTypes = List.of(DataType.stringTypes()); |
|
@craigtaverner do you have time to take a look at this one? |
craigtaverner
left a comment
There was a problem hiding this comment.
LGTM. Although I'd prefer new lines containing SEMANTIC_TEXT to be inserted immediately after the similar TEXT lines instead of later in the code. Keeping SEMANTIC_TEXT close to TEXT feels better. This also applies to block of code.
| return TEXT; | ||
| } | ||
| if (left == SEMANTIC_TEXT || right == SEMANTIC_TEXT) { | ||
| return TEXT; |
There was a problem hiding this comment.
I'm wondering if we shouldn't do the same for TEXT (ie. return KEYWORD). Not part of this PR, though, but worth considering.
| Map.entry(DataType.VERSION, NotEqualsKeywordsEvaluator.Factory::new), | ||
| Map.entry(DataType.IP, NotEqualsKeywordsEvaluator.Factory::new) | ||
| Map.entry(DataType.IP, NotEqualsKeywordsEvaluator.Factory::new), | ||
| Map.entry(DataType.SEMANTIC_TEXT, NotEqualsKeywordsEvaluator.Factory::new) |
There was a problem hiding this comment.
I think this line should be after the TEXT line.
| Map.entry(DataType.VERSION, LessThanOrEqualKeywordsEvaluator.Factory::new), | ||
| Map.entry(DataType.IP, LessThanOrEqualKeywordsEvaluator.Factory::new) | ||
| Map.entry(DataType.IP, LessThanOrEqualKeywordsEvaluator.Factory::new), | ||
| Map.entry(DataType.SEMANTIC_TEXT, LessThanOrEqualKeywordsEvaluator.Factory::new) |
There was a problem hiding this comment.
I think this line should be after the TEXT line.
|
@elasticmachine update branch |
💔 Backport failedThe backport operation could not be completed due to the following error: You can use sqren/backport to manually backport by running |
…rators (elastic#115243) * fix tests * Add CSV tests * Add function tests * Refactor tests * spotless * Use DataType.stringTypes() where possible * Add tests for conditional functions and expressions * Fix tests after merge * Reorder semantic_text evaluators and tests * Re-ordered two more places for SEMANTIC_TEXT after TEXT --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Craig Taverner <craig@amanzi.com>
…rators (elastic#115243) * fix tests * Add CSV tests * Add function tests * Refactor tests * spotless * Use DataType.stringTypes() where possible * Add tests for conditional functions and expressions * Fix tests after merge * Reorder semantic_text evaluators and tests * Re-ordered two more places for SEMANTIC_TEXT after TEXT --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Craig Taverner <craig@amanzi.com>
* semantic_text as string type in ES|QL - support for functions and operators (#115243) * fix tests * Add CSV tests * Add function tests * Refactor tests * spotless * Use DataType.stringTypes() where possible * Add tests for conditional functions and expressions * Fix tests after merge * Reorder semantic_text evaluators and tests * Re-ordered two more places for SEMANTIC_TEXT after TEXT --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Craig Taverner <craig@amanzi.com> * Fix release tests for semantic_text (#116202) --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Craig Taverner <craig@amanzi.com>
tracked in #115103
Related: #114334 that disallows functions to return
TEXTtype values. We need to pay attention to this one, it will likely be merged first.Description
This adds support for
semantic_textas a string type in ES|QL and checks the support for functions and operators (https://www.elastic.co/guide/en/elasticsearch/reference/master/esql-functions-operators.html).The process was:
semantic_texta string type:elasticsearch/x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/type/DataType.java
Lines 372 to 377 in 167e55b
semantic_text; fix any failuressemantic_textensure we have test coverage when the arguments are of typesemantic_textNext
This PR takes care of only functions/operator support. Things that are not covered by tests here, but will be in a follow up:
BUCKETdocsIf preferred I can make the rest of the changes here to ensure the support for commands too, but I figured it might be easier to review this in 2 parts.