[ES|QL] explicit cast a string literal to date_period and time_duration in arithmetic operations#109193
Conversation
|
Hi @fang-xing-esql, I've created a changelog YAML for you. |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Pinging @elastic/kibana-esql (ES|QL-ui) |
|
Thanx @fang-xing-esql ❤️ |
|
I'll add a few more tests for supporting date_period and time_duration in named parameters. |
Sorry, I didn't mean to suggest that the parser should guess that If that was the case, I'd have advocated for expanding support to non-representable types like But I realized the example from above doesn't work, because we don't allow specifying the type - instead, the type is inferred in Thanks @fang-xing-esql ! |
Yeah, type is not supported in named parameter now... |
docs/reference/esql/functions/kibana/definition/to_dateperiod.json
Outdated
Show resolved
Hide resolved
docs/reference/esql/functions/kibana/definition/to_timeduration.json
Outdated
Show resolved
Hide resolved
.../main/java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/ToDatePeriod.java
Outdated
Show resolved
Hide resolved
.../main/java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/ToDatePeriod.java
Outdated
Show resolved
Hide resolved
.../main/java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/ToDatePeriod.java
Outdated
Show resolved
Hide resolved
alex-spies
left a comment
There was a problem hiding this comment.
Ok, I'm nearly done going through everything. I think we'll need a couple more test cases before we merge this. (But the overall approach is fine modulo some minor points that were already raised.)
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/EsqlDataTypeConverter.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/EsqlDataTypeConverter.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/UnaryScalarFunction.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/ToTimeDuration.java
Outdated
Show resolved
Hide resolved
.../java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/ToDatePeriodTests.java
Outdated
Show resolved
Hide resolved
.../java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/ToDatePeriodTests.java
Outdated
Show resolved
Hide resolved
.../java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/ToDatePeriodTests.java
Outdated
Show resolved
Hide resolved
.../java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/ToDatePeriodTests.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Ok, all done now!
To summarize, I think we should
- add some more test cases (both unit and csv tests, as remarked before), and
- make handling of expressions as arguments of
to_dateperiod/to_timedurationconsistent by either allowing arbitrary foldable expressions or literals only.
.../main/java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/ToDatePeriod.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/convert.csv-spec
Outdated
Show resolved
Hide resolved
...ugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/RestEsqlTestCase.java
Outdated
Show resolved
Hide resolved
...ugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/RestEsqlTestCase.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
Show resolved
Hide resolved
astefan
left a comment
There was a problem hiding this comment.
I will wait for @alex-spies review comments to be addressed for another (final 🤞) review.
I would also like to see in this PR an update to documentation in the sense of explaining in a separate page maybe these phantom "data types". "Data type" refers to what one can find in a mapping in an index, "time duration" and "date period" are a type of data types that we define internally to help us reason about specific Strings in specific places (parameters to some functions). Maybe they shouldn't be called "data types" for this reason, I don't know. It would just confuse the users even more. I, for one, am a bit confused (still) about the broadness of this functionality.
Thank you for reviewing @alex-spies !! All of the comments have been addressed I believe, please remind me if there is anything that I missed. |
alex-spies
left a comment
There was a problem hiding this comment.
Okidokey, this looks good! Thanks for the iterations @fang-xing-esql. We never had conversion functions that only can be folded before, and that shows up in some unexpected complexity here and there.
I left a final set of remarks which I consider mostly minor. What should be addressed before merging is:
- Considering to mark
to_dateperiodandto_timedurationas tech preview in the docs for a little while. - Figure out what to do with the validation in case of union types.
See remarks below :)
docs/reference/esql/functions/type-conversion-functions.asciidoc
Outdated
Show resolved
Hide resolved
| @Override | ||
| protected Map<DataType, BuildFactory> factories() { | ||
| return Map.of(); | ||
| } |
There was a problem hiding this comment.
This is quite subtle, but this implies that supportedTypes returns an empty set.
This has an effect in case of union types, which is currently the only useage of supportedTypes.
Test:
curl -u elastic:password -H "Content-Type: application/json" "127.0.0.1:9200/test" -XPUT -d '{ 11:17
"mappings": {
"properties": {
"x": {
"type": "keyword"
}
}
}
}'
curl -u elastic:password -H "Content-Type: application/json" "127.0.0.1:9200/test2" -XPUT -d '{ 11:18
"mappings": {
"properties": {
"x": {
"type": "text"
}
}
}
}'
curl -u elastic:password -H "Content-Type: application/json" "127.0.0.1:9200/_query?format=txt" -d '{ 11:19
"query": "from test* | eval y = now() + to_dateperiod(x)"}
'
This currently fails for the wrong reason:
"verification_exception","reason":"Found 1 problem\nline 1:45: Cannot use field [x] due to ambiguities being mapped as [2] incompatible types: [keyword] in [test], [text] in [test2]"}]
I think we need to do the following:
- This PR makes
supportedTypeschange meaning! It's not "types I can convert" anymore, but "types I can convert if they come from a field". I think it's important we reflect that by renamingsupportedTypesor at least giving it a better javadoc. Maybe rename tosupportedFieldTypesand add a javadoc that says that we may be able to fold on more types than that in case of unrepresentable types. - See if we can make the error message better in case of union types. The problem is not that
to_dateperioddoesn't accept keyword or text fields, it's that it doesn't apply to index fields at all. This can be in a follow up, or let's create an issue and come back to it.
There was a problem hiding this comment.
Thanks for catching this issue! I agree we should show the correct error messages. I looked into it, and it looks to me that this could be related to the order of validations.
I have an idea to make it show the correct message, but it modifies resolveType() in an unusual way. resolveType() usually checks if its children is resolved at the very beginning, if there is unresolved child, the rest of the validations/checks will be skipped, and it kinds of hide the error messages that we wanted.
For this particular case, these two conversion functions don't support fields, if we add a check in resolveType() - if it sees a field, then return a type resolution error, the expected messages can be seen. However the check to fields need to happen before the check to its children, in order to have the expected messages returned by Analyzer's Verifier.
+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
"query": "from test* | eval y = now() + to_dateperiod(x)"}
'
{"error":{"root_cause":[{"type":"verification_exception","reason":"Found 2 problems\nline 1:45: Cannot use field [x] due to ambiguities being mapped as [2] incompatible types: [keyword] in [test], [text] in [test2]\nline 1:31: argument of [to_dateperiod(x)] must be a constant, received [x]"}],"type":"verification_exception","reason":"Found 2 problems\nline 1:45: Cannot use field [x] due to ambiguities being mapped as [2] incompatible types: [keyword] in [test], [text] in [test2]\nline 1:31: argument of [to_dateperiod(x)] must be a constant, received [x]"},"status":400}%
With this change, the expected message can be seen in the VerificationException. This is what I can think so far to make the correct message show, please let me know if there is a better way, I think we should return the expected error messages.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/EsqlDataTypeConverter.java
Show resolved
Hide resolved
...sticsearch/xpack/esql/expression/function/scalar/convert/ToDatePeriodSerializationTests.java
Outdated
Show resolved
Hide resolved
.../java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/ToDatePeriodTests.java
Outdated
Show resolved
Hide resolved
astefan
left a comment
There was a problem hiding this comment.
LGTM
Few very minor remarks and a more important one that aims to better structure and proof the class hierarchy for conversion functions.
docs/reference/esql/functions/type-conversion-functions.asciidoc
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/ToTimeDuration.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/EsqlDataTypeConverter.java
Outdated
Show resolved
Hide resolved
...sticsearch/xpack/esql/expression/function/scalar/convert/ToDatePeriodSerializationTests.java
Outdated
Show resolved
Hide resolved
|
Thanks for reviewing @alex-spies and @astefan!! All of your comments should be addressed in the last commit. Alex brought up an issue with union type and I modified If you can take a final look it will be great, otherwise I'm planning to merge it next week. |
alex-spies
left a comment
There was a problem hiding this comment.
Thanks @fang-xing-esql ! The intended addition of this PR is in a very nice state so let's get this over the finish line. We only have to deal with the unintended scope creep due to running into union types, d'oh.
I left a comment below that I think really should be addressed, so we keep the union type resolution nice and clean without adding workarounds. But let's go for it in a follow-up PR - or let's open an issue for this and come back to this later.
...rg/elasticsearch/xpack/esql/expression/function/scalar/convert/FoldablesConvertFunction.java
Show resolved
Hide resolved
There was a problem hiding this comment.
I think this is wrong, because we can end up with 2 validation errors. This happens e.g. when you do from test* | eval y = (now() + to_dateperiod(to_dateperiod(x))) in the union type setup from before, where test* maps x as keyword and text in different indices. Or just from test* | eval x = now() + to_dateperiod(x).
This is only needed for union types, right? Because in case of union types, we'll have a FoldablesConvertFunction wrapping an UnsupportedAttribute. So, we end up with a plan like
[2024-09-09T12:35:40,032][INFO ][o.e.x.e.a.Analyzer ] [runTask-0] Rule analysis.Analyzer$UnionTypesCleanup applied
Limit[1000[INTEGER]] = Limit[1000[INTEGER]]
\_Eval[[NOW() + TODATEPERIOD(x{f}#15) AS x]] ! \_Eval[[NOW() + TODATEPERIOD(!x) AS x]]
\_EsRelation[test*][x{f}#15] ! \_EsRelation[test*][!x]
Where the EVAL has an unsupported attribute !x and TODATEPERIOD(...) has unresolved types because !x is not a constant.
Let's do the following:
- Let's undo the fix for union types, as well as the extra check in
FoldablesConvertFunction.resolveType. Let's tackle this in a follow up. It's annoying, but this is more intricate than we thought because we ended up in union type territory and we need to deal with this :) - Commenting out this line had no tests fail for me when running
./gradlew ":x-pack:plugin:esql:test". The follow up needs tests with union types that fail without a fix. Union types are currently a bit painful to set up, but we have yaml tests for them - or we could improve the verifier tests to enable testing union types. The two queries I wrote above could be a good start to cover these edge cases in yaml tests. - I believe the verifier code here is not the best place to add union type specific workarounds, and ideally we also shouldn't need the extra check for field attributes in
FoldablesConvertFunction.resolveType. I think the best way is to fix this inAnalyzer.ResolveUnionTypes, more specifically inresolveConvertFunction(Analyzer.java, line 1227). We need to generate better error messages here, anyway, in case the query has a convert function applied to an index field, but there's either no evaluator (this is the case here) or the type doesn't make sense, liketo_ipapplied to a multi-typed field where one of the types isdouble. - I really think the follow up PR should also rename
AbstractConvertFunction.supportedTypestoevaluableTypesto avoid footguns in the future.
There was a problem hiding this comment.
#112668 is opened to follow up with this issue.
There was a problem hiding this comment.
I agree that the solution should be in the Analyzer.ResolveUnionTypes, and also think it could be related to another idea we have, which is to support union-types for more functions than just convert functions. Currently we assume that all convert functions support union-types and no other functions do. But this discussion clarifies that there exist convert functions that do not. So what about creating an interface for functions that support union-types, make sure all but these foldable functions use that, and this opens up the option to add union-types to other functions too?
Provides the functionality to cast a string literal to date_period or time_duration in arithmetic operations. It may support #108421, when a date_period or time_duration is provided as a string literal in EsqlQueryRequest params.