Skip to content

[ES|QL] explicit cast a string literal to date_period and time_duration in arithmetic operations#109193

Merged
fang-xing-esql merged 39 commits intoelastic:mainfrom
fang-xing-esql:to-date-time-intervals
Sep 9, 2024
Merged

[ES|QL] explicit cast a string literal to date_period and time_duration in arithmetic operations#109193
fang-xing-esql merged 39 commits intoelastic:mainfrom
fang-xing-esql:to-date-time-intervals

Conversation

@fang-xing-esql
Copy link
Copy Markdown
Member

@fang-xing-esql fang-xing-esql commented May 29, 2024

Provides the functionality to cast a string literal to date_period or time_duration in arithmetic operations. It may support #108421, when a date_period or time_duration is provided as a string literal in EsqlQueryRequest params.

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @fang-xing-esql, I've created a changelog YAML for you.

@fang-xing-esql fang-xing-esql marked this pull request as ready for review June 20, 2024 00:50
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 20, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@fang-xing-esql fang-xing-esql added the ES|QL-ui Impacts ES|QL UI label Aug 2, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

@stratoula
Copy link
Copy Markdown

Thanx @fang-xing-esql ❤️

@fang-xing-esql
Copy link
Copy Markdown
Member Author

I'll add a few more tests for supporting date_period and time_duration in named parameters.

@costin costin requested review from astefan and bpintea August 6, 2024 23:45
@alex-spies
Copy link
Copy Markdown
Contributor

alex-spies commented Aug 26, 2024

Changing parser to recognize both qualifiedIntegerLiteral and a string as an interval is risky in my opinion, what if the users really mean to provide "3 days" as a string literal, but not an interval? There is chance that EsqlParser might guess it wrong, at parser, we don't have information about the field's data type yet.

Sorry, I didn't mean to suggest that the parser should guess that 3 days is not supposed to be a string but a date period. I was looking at the first example in the PR description of #108421 and thought that the params in a request could be declared together with an explicit data type, e.g.

{
  "query": "ROW x = ?",
  "params": [{"value": 1, "type" : "integer"}]
}

If that was the case, I'd have advocated for expanding support to non-representable types like time_duration.

But I realized the example from above doesn't work, because we don't allow specifying the type - instead, the type is inferred in RequestXContent.parseParams. And we probably don't want to add support for explicit type declarations in params just for the sake of time_duration and date_period; I think this PR's approach is already the best approach.

Thanks @fang-xing-esql !

@fang-xing-esql
Copy link
Copy Markdown
Member Author

{
  "query": "ROW x = ?",
  "params": [{"value": 1, "type" : "integer"}]
}

Yeah, type is not supported in named parameter now...

Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm nearly done going through everything. I think we'll need a couple more test cases before we merge this. (But the overall approach is fine modulo some minor points that were already raised.)

Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, all done now!

To summarize, I think we should

  • add some more test cases (both unit and csv tests, as remarked before), and
  • make handling of expressions as arguments of to_dateperiod/to_timeduration consistent by either allowing arbitrary foldable expressions or literals only.

Copy link
Copy Markdown
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will wait for @alex-spies review comments to be addressed for another (final 🤞) review.

I would also like to see in this PR an update to documentation in the sense of explaining in a separate page maybe these phantom "data types". "Data type" refers to what one can find in a mapping in an index, "time duration" and "date period" are a type of data types that we define internally to help us reason about specific Strings in specific places (parameters to some functions). Maybe they shouldn't be called "data types" for this reason, I don't know. It would just confuse the users even more. I, for one, am a bit confused (still) about the broadness of this functionality.

@fang-xing-esql
Copy link
Copy Markdown
Member Author

Ok, I'm nearly done going through everything. I think we'll need a couple more test cases before we merge this. (But the overall approach is fine modulo some minor points that were already raised.)

Thank you for reviewing @alex-spies !! All of the comments have been addressed I believe, please remind me if there is anything that I missed.

@costin costin requested review from alex-spies and astefan September 5, 2024 01:05
Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okidokey, this looks good! Thanks for the iterations @fang-xing-esql. We never had conversion functions that only can be folded before, and that shows up in some unexpected complexity here and there.

I left a final set of remarks which I consider mostly minor. What should be addressed before merging is:

  1. Considering to mark to_dateperiod and to_timeduration as tech preview in the docs for a little while.
  2. Figure out what to do with the validation in case of union types.
    See remarks below :)

Comment on lines +72 to +75
@Override
protected Map<DataType, BuildFactory> factories() {
return Map.of();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite subtle, but this implies that supportedTypes returns an empty set.

This has an effect in case of union types, which is currently the only useage of supportedTypes.

Test:

curl -u elastic:password -H "Content-Type: application/json" "127.0.0.1:9200/test" -XPUT -d '{                                                                    11:17
  "mappings": {                                                                                                         
        "properties": {
            "x": {       
                "type": "keyword"
            }
        }
  }
}'

curl -u elastic:password -H "Content-Type: application/json" "127.0.0.1:9200/test2" -XPUT -d '{                                                                   11:18
  "mappings": {
        "properties": {
            "x": {
                "type": "text"   
            }
        }
  }
}'

curl -u elastic:password -H "Content-Type: application/json" "127.0.0.1:9200/_query?format=txt" -d '{                                                             11:19
    "query": "from test* | eval y = now() + to_dateperiod(x)"}
'

This currently fails for the wrong reason:

"verification_exception","reason":"Found 1 problem\nline 1:45: Cannot use field [x] due to ambiguities being mapped as [2] incompatible types: [keyword] in [test], [text] in [test2]"}]

I think we need to do the following:

  1. This PR makes supportedTypes change meaning! It's not "types I can convert" anymore, but "types I can convert if they come from a field". I think it's important we reflect that by renaming supportedTypes or at least giving it a better javadoc. Maybe rename to supportedFieldTypes and add a javadoc that says that we may be able to fold on more types than that in case of unrepresentable types.
  2. See if we can make the error message better in case of union types. The problem is not that to_dateperiod doesn't accept keyword or text fields, it's that it doesn't apply to index fields at all. This can be in a follow up, or let's create an issue and come back to it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this issue! I agree we should show the correct error messages. I looked into it, and it looks to me that this could be related to the order of validations.

I have an idea to make it show the correct message, but it modifies resolveType() in an unusual way. resolveType() usually checks if its children is resolved at the very beginning, if there is unresolved child, the rest of the validations/checks will be skipped, and it kinds of hide the error messages that we wanted.

For this particular case, these two conversion functions don't support fields, if we add a check in resolveType() - if it sees a field, then return a type resolution error, the expected messages can be seen. However the check to fields need to happen before the check to its children, in order to have the expected messages returned by Analyzer's Verifier.

+ curl -u elastic:password -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
    "query": "from test* | eval y = now() + to_dateperiod(x)"}
'
{"error":{"root_cause":[{"type":"verification_exception","reason":"Found 2 problems\nline 1:45: Cannot use field [x] due to ambiguities being mapped as [2] incompatible types: [keyword] in [test], [text] in [test2]\nline 1:31: argument of [to_dateperiod(x)] must be a constant, received [x]"}],"type":"verification_exception","reason":"Found 2 problems\nline 1:45: Cannot use field [x] due to ambiguities being mapped as [2] incompatible types: [keyword] in [test], [text] in [test2]\nline 1:31: argument of [to_dateperiod(x)] must be a constant, received [x]"},"status":400}%       

With this change, the expected message can be seen in the VerificationException. This is what I can think so far to make the correct message show, please let me know if there is a better way, I think we should return the expected error messages.

Copy link
Copy Markdown
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Few very minor remarks and a more important one that aims to better structure and proof the class hierarchy for conversion functions.

@fang-xing-esql
Copy link
Copy Markdown
Member Author

Thanks for reviewing @alex-spies and @astefan!! All of your comments should be addressed in the last commit.

Alex brought up an issue with union type and I modified FoldablesConvertFunction's resolveType() and Verifier in order to have VerificationException show the correct error messages, I'm not sure if there is a better way though.

If you can take a final look it will be great, otherwise I'm planning to merge it next week.

Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fang-xing-esql ! The intended addition of this PR is in a very nice state so let's get this over the finish line. We only have to deal with the unintended scope creep due to running into union types, d'oh.

I left a comment below that I think really should be addressed, so we keep the union type resolution nice and clean without adding workarounds. But let's go for it in a follow-up PR - or let's open an issue for this and come back to this later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong, because we can end up with 2 validation errors. This happens e.g. when you do from test* | eval y = (now() + to_dateperiod(to_dateperiod(x))) in the union type setup from before, where test* maps x as keyword and text in different indices. Or just from test* | eval x = now() + to_dateperiod(x).

This is only needed for union types, right? Because in case of union types, we'll have a FoldablesConvertFunction wrapping an UnsupportedAttribute. So, we end up with a plan like

[2024-09-09T12:35:40,032][INFO ][o.e.x.e.a.Analyzer       ] [runTask-0] Rule analysis.Analyzer$UnionTypesCleanup applied
Limit[1000[INTEGER]]                         = Limit[1000[INTEGER]]
\_Eval[[NOW() + TODATEPERIOD(x{f}#15) AS x]] ! \_Eval[[NOW() + TODATEPERIOD(!x) AS x]]
  \_EsRelation[test*][x{f}#15]               !   \_EsRelation[test*][!x]

Where the EVAL has an unsupported attribute !x and TODATEPERIOD(...) has unresolved types because !x is not a constant.

Let's do the following:

  • Let's undo the fix for union types, as well as the extra check in FoldablesConvertFunction.resolveType. Let's tackle this in a follow up. It's annoying, but this is more intricate than we thought because we ended up in union type territory and we need to deal with this :)
  • Commenting out this line had no tests fail for me when running ./gradlew ":x-pack:plugin:esql:test". The follow up needs tests with union types that fail without a fix. Union types are currently a bit painful to set up, but we have yaml tests for them - or we could improve the verifier tests to enable testing union types. The two queries I wrote above could be a good start to cover these edge cases in yaml tests.
  • I believe the verifier code here is not the best place to add union type specific workarounds, and ideally we also shouldn't need the extra check for field attributes in FoldablesConvertFunction.resolveType. I think the best way is to fix this in Analyzer.ResolveUnionTypes, more specifically in resolveConvertFunction (Analyzer.java, line 1227). We need to generate better error messages here, anyway, in case the query has a convert function applied to an index field, but there's either no evaluator (this is the case here) or the type doesn't make sense, like to_ip applied to a multi-typed field where one of the types is double.
  • I really think the follow up PR should also rename AbstractConvertFunction.supportedTypes to evaluableTypes to avoid footguns in the future.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#112668 is opened to follow up with this issue.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the solution should be in the Analyzer.ResolveUnionTypes, and also think it could be related to another idea we have, which is to support union-types for more functions than just convert functions. Currently we assume that all convert functions support union-types and no other functions do. But this discussion clarifies that there exist convert functions that do not. So what about creating an interface for functions that support union-types, make sure all but these foldable functions use that, and this opens up the option to add union-types to other functions too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement ES|QL-ui Impacts ES|QL UI Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.16.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants