Skip to content

Esql/create DATE_DIFF function#103208

Merged
luigidellaquila merged 10 commits intoelastic:mainfrom
nicolasgras:esql/create-date-diff-function
Jan 9, 2024
Merged

Esql/create DATE_DIFF function#103208
luigidellaquila merged 10 commits intoelastic:mainfrom
nicolasgras:esql/create-date-diff-function

Conversation

@nicolasgras
Copy link
Copy Markdown
Contributor

This pull request resolves #101942.

A new functionality DATE_DIFF is added to ESQL. This functionaliy is similar to the ones in SQL plugin.

It calculates the diffenrence between this 2 dates using a unit specified and takes 3 input parameters:

  • unit (text)
  • startTimestamp (dateTime)
  • endTimestamp (dateTime)

Screenshot from 2023-12-08 01-37-41

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 8, 2023

Documentation preview:

@cla-checker-service
Copy link
Copy Markdown

cla-checker-service bot commented Dec 8, 2023

💚 CLA has been signed

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team v8.13.0 labels Dec 8, 2023
@kingherc kingherc added the :Analytics/ES|QL AKA ESQL label Dec 19, 2023
@elasticsearchmachine elasticsearchmachine added Team:QL (Deprecated) Meta label for query languages team and removed needs:triage Requires assignment of a team area label labels Dec 19, 2023
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

Copy link
Copy Markdown
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your interest and contribution, @nicolasgras.
I had a first pass. My main question is about the code duplication. Any reason you didn't extract the common parts (in the ql module) and reused it here?

return aliases;
}

private static int safeInt(long diff) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's DataTypeConverter#safeToInt available. But I'm mostly wondering why not return long.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the code to use DataTypeConverter#safeToInt and removed the one I implemented that was identical.


static {
NAME_TO_PART = DateTimeField.initializeResolutionMap(values());
VALID_VALUES = DateTimeField.initializeValidValues(values());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used anywhere atm. I guess the initialization of NAME_TO_PART can then be inlined too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, VALID_VALUES List is not used. I should have cleant this part.
The initialization of NAME_TO_PART is inlined now.

Comment on lines +80 to +86
private BiFunction<ZonedDateTime, ZonedDateTime, Integer> diffFunction;
private Set<String> aliases;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"may be final"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, they are final now.

description = "Subtract 2 dates and return their difference in multiples of a unit specified in the 1st argument")
public DateDiff(
Source source,
@Param(name = "unit", type = { "text" }, description = "A valid date unit") Expression unit,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type can also be keyword.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I have updated the code.

this.endTimestamp = endTimestamp;
}

@Evaluator
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to catch the thrown exceptions in the evaluator implementor so that we can return a null, rather than stop the execution. Have a look at DateParse for an example.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I have updated the code.

ZonedDateTime zdtStart = ZonedDateTime.ofInstant(Instant.ofEpochMilli(startTimestamp), UTC);
ZonedDateTime zdtEnd = ZonedDateTime.ofInstant(Instant.ofEpochMilli(endTimestamp), UTC);

Part datePartField = Part.resolve(unit.utf8ToString());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could maybe have two evaluators here: in case you'd do smth like date_diff(some_string_field, ..., ...), then we'd need to evaluate the first parameter to establish what's the unit we need to apply, as done already. But if we write something like date_diff("hours", ..., ...), we can do this work upfront, not with every evaluation. And it's likely this would be the most common use case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the code by adding a new evaluator (I get something similar to DateParse now).

type = { "date" },
description = "A string representing a start timestamp"
) Expression startTimestamp,
@Param(name = "endTimestamp", type = { "date" }, description = "A string representing a end timestamp") Expression endTimestamp
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aN end timestamp

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This typo has been fixed.

if (datePartField == null) {
List<String> similar = Part.findSimilar(unit.utf8ToString());
if (similar.isEmpty()) {
throw new InvalidArgumentException(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit on style: you could prepare the message in the two branches and throw just once.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I have updated the code.

@nicolasgras
Copy link
Copy Markdown
Contributor Author

Thanks for your interest and contribution, @nicolasgras. I had a first pass. My main question is about the code duplication. Any reason you didn't extract the common parts (in the ql module) and reused it here?

Thank you for you review @bpintea.
As this issue is about esql, I thought I should not update any other plugin. But it is better to use it in a common module.
As you suggest, I have moved DateTimeFiled interface in ql module. Maybe let's keep in mind to also use it in sql module if this PR is accepted and merged.

I have also added unit tests in esql/qa/testFixtures directory to have a better code coverage.

Let me know if it is ok for you.

Copy link
Copy Markdown
Member

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @nicolasgras!
Just two comments from my side:

  • I see the CI still complains about CLA signature; I checked your commits and I see they appear with two different email addresses (one is invalid of course):
Author: Nicolas Gras <nicolasgras01@gmail.com>
Author: Nicolas Gras <nicolas@system76-pc.localdomain>

You have to sign the CLA with both to make the checks green, even though the second one is obviously a local one.

  • I think @bpintea referred to the content of org.elasticsearch.xpack.sql.expression.function.scalar.datetime.DateDiff (the enum Part and some logic) when he asked about common parts that can be extracted to QL.
    IMHO it's not mandatory, we could also keep the two things separated and have this PR self-contained in ESQL module, but in that case I'd avoid to move DateTimeField to QL

@wchaparro wchaparro added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 2, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine elasticsearchmachine removed the Team:QL (Deprecated) Meta label for query languages team label Jan 2, 2024
@nicolasgras nicolasgras force-pushed the esql/create-date-diff-function branch from 0bf177d to bb62dfd Compare January 3, 2024 05:41
@nicolasgras
Copy link
Copy Markdown
Contributor Author

Thank you @luigidellaquila for your review.

You have to sign the CLA with both to make the checks green, even though the second one is obviously a local one.

I have rebased my commits on top of the branch and updated the author of them, then the CI does not complain anymore about CLA signature.

I think @bpintea referred to the content of org.elasticsearch.xpack.sql.expression.function.scalar.datetime.DateDiff (the enum Part and some logic) when he asked about common parts that can be extracted to QL.
IMHO it's not mandatory, we could also keep the two things separated and have this PR self-contained in ESQL module, but in that case I'd avoid to move DateTimeField to QL.

@bpintea , what do you think about it?
After your review I updated and cleant org.elasticsearch.xpack.esql.expression.function.scalar.date.DateDiff.Part, even if it is similar to org.elasticsearch.xpack.sql.expression.function.scalar.datetime.DateDiff.Part, they are a little different now.
And do you think we should move back DateTimeField to ESQL?

@bpintea
Copy link
Copy Markdown
Contributor

bpintea commented Jan 3, 2024

@bpintea , what do you think about it?

In some cases sharing code between other *ql modules within the ql one makes things restrictive/tricky. If the code is very similar, it makes sense, however. But I see you've used some SDK functionality over custom implementation, which is preferred, and if that covers the needs for this PR, I'm OK containing the changes to the esql module, as @luigidellaquila suggested. (We can extract these changes in QL and update SQL later if needed.)

@nicolasgras
Copy link
Copy Markdown
Contributor Author

@bpintea and @luigidellaquila, I have moved back DateTimeField interface into ESQL.
An issue could be created later to extract some code in QL and rework some parts in SQL if needed.
Is it ok for you?

Copy link
Copy Markdown
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left two notes, otherwise it LGTM.

I guess with SQL's DateDiff we've follow other implementations returning int and we follow suite also here. This is OK, though we might want to update it, if we'll add support for nano timestamps (which we don't have yet), in which case an int will not cover even 3s.

@nicolasgras nicolasgras requested a review from bpintea January 9, 2024 05:01
Copy link
Copy Markdown
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

Copy link
Copy Markdown
Member

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @nicolasgras, LGTM

@luigidellaquila luigidellaquila merged commit ec2e185 into elastic:main Jan 9, 2024
DaveCTurner added a commit that referenced this pull request Jan 9, 2024
@DaveCTurner
Copy link
Copy Markdown
Member

Unfortunately this commit broke the build (checkstyle violations) so I reverted it in 943b2ea. Luigi will open a PR to reinstate it.

@luigidellaquila
Copy link
Copy Markdown
Member

#104118

elasticsearchmachine pushed a commit that referenced this pull request Jan 9, 2024
Same as #103208

Fixes #101942

We had to revert it after a Checkstyle failure (strange it didn't pop up
in the CI before merging)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL external-contributor Pull request authored by a developer outside the Elasticsearch team >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.13.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESQL: add DATE_DIFF functionality

7 participants