[ESQL] Add TO_DATE_NANOS conversion function#112150
[ESQL] Add TO_DATE_NANOS conversion function#112150not-napoleon merged 28 commits intoelastic:mainfrom
Conversation
Conflicts: x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/EsqlFunctionRegistry.java
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
| def(ToCartesianShape.class, ToCartesianShape::new, "to_cartesianshape"), | ||
| def(ToDatePeriod.class, ToDatePeriod::new, "to_dateperiod"), | ||
| def(ToDatetime.class, ToDatetime::new, "to_datetime", "to_dt"), | ||
| def(ToDateNanos.class, ToDateNanos::new, "to_date_nanos", "to_datenanos"), |
There was a problem hiding this comment.
Should we add a to_datetime_nanos? If I were accustomed to to_datetime I'd expect a _nanos equivalent function.
Otherwise, I guess the name of the function maps on the datetype name, but I guess I do find it a tad confusing the difference between name types (some relevant comments found in #99006)
There was a problem hiding this comment.
Personally, I don't think so. If I was going to add another alias, I'd opt for date_nanoseconds, to align with the title of https://www.elastic.co/guide/en/elasticsearch/reference/master/date_nanos.html, but really I think date_nanos is fine.
...c/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/ToDateNanos.java
Outdated
Show resolved
Hide resolved
|
|
||
| @ConvertEvaluator(extraName = "FromLong", warnExceptions = { IllegalArgumentException.class }) | ||
| static long fromLong(long in) { | ||
| if (in < 0) { |
| } | ||
|
|
||
| @ConvertEvaluator(extraName = "FromString", warnExceptions = { IllegalArgumentException.class }) | ||
| static long fromKeyword(BytesRef in) { |
There was a problem hiding this comment.
Would we want to have this method delegate to something in EsqlDataTypeConverter (similar to nanoTimeToString())? I imagine this conversion might be useful when we add support for parsing nanos or bucketing them.
There was a problem hiding this comment.
So, this conversion actually exists as org.elasticsearch.common.time.DateUtils#toLong, and in retrospect I should probably just delegate to that.
bpintea
left a comment
There was a problem hiding this comment.
We should add the type to the EsqlDataTypeConverter.TYPE_TO_CONVERTER_FUNCTION mapping and add a few tests with the :: conversion operator.
astefan
left a comment
There was a problem hiding this comment.
Added few comments. Nothing major. Also:
- some more tests in csv-spec would be welcomed:
- a foldable function like
concat("2023-01-01","T12:12:12")should be used as part ofevalorrowand its value computed in ato_date_nanosor::date_nanos. - tests with
nullas a constant and as a computed value:to_date_nanos(null + 1::long),eval x = null, y = concat("2023", null) | limit 1 | stats max(123) by to_date_nanos(y) - some constant that fails the range check:
eval g = to_date_nanos(-5::long) - tests with real indices data
- a foldable function like
- some tests with union types (
dateanddate_nanosfor example,longanddate_nanos) date_nanosbeing a date data type, I think it's worth taking our date functions for a spin. I looked only atdate_extract("nano_of_day", to_date_nanos(z))which fails withsecond argument of [date_extract("nano_of_day", to_date_nanos(z))] must be [datetime], found value [to_date_nanos(z)] type [date_nanos]and it shouldn't. Indeed, I didn't want to extend the support you are adding with this PR and have it more contained, I only tried to test the new function in nested functions calls
| string to date nanos, out of range | ||
| required_capability: date_nanos_type | ||
|
|
||
| ROW d = TO_DATE_NANOS("2262-04-12T00:00:00.000") | KEEP d; |
There was a problem hiding this comment.
I see the syntax ::date_nanos is not supported (yet?). Is this intentional?
| try { | ||
| return Math.multiplyExact(in, 1_000_000L); | ||
| } catch (ArithmeticException e) { | ||
| // This seems like a more useful error message than "Long Overflow" |
...c/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/ToDateNanos.java
Outdated
Show resolved
Hide resolved
|
|
||
| @ConvertEvaluator(extraName = "FromString", warnExceptions = { IllegalArgumentException.class }) | ||
| static long fromKeyword(BytesRef in) { | ||
| Instant parsed = DateFormatters.from(DEFAULT_DATE_NANOS_FORMATTER.parse(in.utf8ToString())).toInstant(); |
There was a problem hiding this comment.
Is something here throwing already on pre-1970 dates?
I don't see a test case for pre-1970 in ToDateNanosTests, would be nice adding it
There was a problem hiding this comment.
I've updated this to use DateUtils.toLong() which does that check.
There was a problem hiding this comment.
Nit: I'm missing checks for pre-1970 dates for all types here. A single test checking all would be enough.
Nitpick because we already have the ToDateNanosTests cases, but I'd add it here too
There was a problem hiding this comment.
I've added row tests for pre-1970 dates.
I have a separate ticket for this, as I expect it will be quite involved. #112885. That said, if you'd rather I did it in this PR, I would be fine with that. I just thought this PR was already getting pretty big.
Indeed, passing date nanos into date functions is not (yet) supported. The roadmap for the feature lists that work further down: #109352. At the moment we have it marked as not required for an MVP (i.e. before taking off the feature flag), but we can move it into the per-release work if we want to. It will definitely be required if we implement the auto casting into date nanos that we talked about recently. |
That's ok 👍 |
|
@elasticmachine update branch |
💔 Backport failed
You can use sqren/backport to manually backport by running |
Resolves elastic#111842 This adds a conversion function that yields DATE_NANOS. Mostly this is straight forward. It is worth noting that when converting a millisecond date into a nanosecond date, the conversion function truncates it to 0 nanoseconds (i.e. first nanosecond of that millisecond). This is, of course, a bit of an assumption, but I don't have a better assumption we can make. I'd thought about adding a second, optional, parameter to control this behavior, but it's important that TO_DATE_NANOS extend AbstractConvertFunction, which itself extends UnaryScalarFunction, so that it will work correctly with union types. Also, it's unlikely the user will have any better guess than we do for filling in the nanoseconds. Making that assumption does, however, create some weirdness. Consider two comparisons: TO_DATETIME("2023-03-23T12:15:03.360103847") == TO_DATETIME("2023-03-23T12:15:03.360") will return true while TO_DATE_NANOS("2023-03-23T12:15:03.360103847") == TO_DATE_NANOS("2023-03-23T12:15:03.360") will return false. This is akin to casting between longs and doubles, where things may compare equal in one type that are not equal in the other. This seems fine, and I can't think of a better way to do it, but it's worth being aware of. --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Resolves #111842 This adds a conversion function that yields DATE_NANOS. Mostly this is straight forward. It is worth noting that when converting a millisecond date into a nanosecond date, the conversion function truncates it to 0 nanoseconds (i.e. first nanosecond of that millisecond). This is, of course, a bit of an assumption, but I don't have a better assumption we can make. I'd thought about adding a second, optional, parameter to control this behavior, but it's important that TO_DATE_NANOS extend AbstractConvertFunction, which itself extends UnaryScalarFunction, so that it will work correctly with union types. Also, it's unlikely the user will have any better guess than we do for filling in the nanoseconds. Making that assumption does, however, create some weirdness. Consider two comparisons: TO_DATETIME("2023-03-23T12:15:03.360103847") == TO_DATETIME("2023-03-23T12:15:03.360") will return true while TO_DATE_NANOS("2023-03-23T12:15:03.360103847") == TO_DATE_NANOS("2023-03-23T12:15:03.360") will return false. This is akin to casting between longs and doubles, where things may compare equal in one type that are not equal in the other. This seems fine, and I can't think of a better way to do it, but it's worth being aware of. --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Resolves #111842
This adds a conversion function that yields
DATE_NANOS. Mostly this is straight forward.It is worth noting that when converting a millisecond date into a nanosecond date, the conversion function truncates it to 0 nanoseconds (i.e. first nanosecond of that millisecond). This is, of course, a bit of an assumption, but I don't have a better assumption we can make. I'd thought about adding a second, optional, parameter to control this behavior, but it's important that
TO_DATE_NANOSextendAbstractConvertFunction, which itself extendsUnaryScalarFunction, so that it will work correctly with union types. Also, it's unlikely the user will have any better guess than we do for filling in the nanoseconds.Making that assumption does, however, create some weirdness. Consider two comparisons:
TO_DATETIME("2023-03-23T12:15:03.360103847") == TO_DATETIME("2023-03-23T12:15:03.360")will return true whileTO_DATE_NANOS("2023-03-23T12:15:03.360103847") == TO_DATE_NANOS("2023-03-23T12:15:03.360")will return false. This is akin to casting between longs and doubles, where things may compare equal in one type that are not equal in the other. This seems fine, and I can't think of a better way to do it, but it's worth being aware of.