Skip to content

feat: Add Spark date_trunc function#11340

Closed
zml1206 wants to merge 8 commits intofacebookincubator:mainfrom
zml1206:add_date_trunc
Closed

feat: Add Spark date_trunc function#11340
zml1206 wants to merge 8 commits intofacebookincubator:mainfrom
zml1206:add_date_trunc

Conversation

@zml1206
Copy link
Copy Markdown
Contributor

@zml1206 zml1206 commented Oct 24, 2024

Adds Spark date_trunc function. Presto date_trunc function was refactored for reuse:

  1. Moves class DateTimeUnit to functions/lib/DateTimeFormatter.h.
  2. Extends function fromDateTimeUnitString by adding params allowMicro and
    allowAbbreviated for Spark, and moves to functions/lib/TimeUtils.h.
  3. Moves adjustDateTime and adjustEpoch functions to functions/lib/TimeUtils.h.
  4. Extracts new function truncateTimestamp to functions/lib/TimeUtils.h .

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 24, 2024
@netlify
Copy link
Copy Markdown

netlify Bot commented Oct 24, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit ca45de4
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67ed4a46fe927a0008684fa5

@rui-mo rui-mo changed the title add date_trunc spark function Add Spark date_trunc function Oct 28, 2024
Copy link
Copy Markdown
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The key idea is that by moving some of the methods to the lib for reuse, we can reduce the code duplication.

Comment thread velox/docs/functions/spark/datetime.rst Outdated
Comment thread velox/functions/sparksql/DateTimeFunctions.h Outdated
Comment thread velox/functions/sparksql/DateTimeFunctions.h Outdated
Comment thread velox/functions/sparksql/DateTimeFunctions.h Outdated
Comment thread velox/functions/sparksql/DateTimeFunctions.h Outdated
@zml1206 zml1206 requested a review from rui-mo October 29, 2024 06:17
Copy link
Copy Markdown
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for refactoring.

Comment thread velox/functions/sparksql/DateTimeFunctions.h Outdated
Comment thread velox/functions/lib/TimeUtils.h Outdated
}
};

FOLLY_ALWAYS_INLINE std::optional<DateTimeUnit> fromDateTimeUnitString(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please add some comments for the new API? If there are unit tests for them, please move them into the lib; if not, we will need to add them.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe mention it in the currently empty PR description/commit message.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Copy Markdown
Contributor

@jinchengchenghh jinchengchenghh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some minors. Could you also clarify what you have changed in the PR description?

Comment thread velox/docs/functions/spark/datetime.rst Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns timestamp ts -> Returns timestamp that ts, by the format model fmt

Comment thread velox/docs/functions/spark/datetime.rst Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that the ts falls, please quote all the ts, thanks!

Comment thread velox/functions/lib/TimeUtils.h Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this fields are used only once, do we need to set them? Or we could use the string directly in if condition? CC @rui-mo
For example, https://github.com/facebookincubator/velox/blob/main/velox/common/base/StatsReporter.h#L70

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the original presto function code. Now it is just extracted to lib for reuse. Do we need to change it? @jinchengchenghh @rui-mo

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove those defined in DateTimeFormatter then? And +1 for using string directly because these variables are used only once.

Comment thread velox/functions/lib/TimeUtils.h Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add . in the end.

Comment thread velox/functions/lib/TimeUtils.h Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment thread velox/functions/lib/TimeUtils.h Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment thread velox/functions/lib/TimeUtils.h Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add the const to enum class, you could consider it as integer like.
https://stackoverflow.com/questions/44749658/c-is-it-better-to-pass-an-enum-class-as-value-or-const-reference

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinchengchenghh Thanks for review. It seems to be used this way in velox,

inline bool isTimeUnit(const DateTimeUnit unit) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add const when passing by value.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the class member to the last of the struct as FromUnixtimeFunction.
And move it to private scope

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const DateTimeUnit unit. I would prefer to drop this initialization since it is used only once.

Copy link
Copy Markdown
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Comment thread velox/docs/functions/spark/datetime.rst Outdated
Comment thread velox/docs/functions/spark/datetime.rst Outdated
Comment thread velox/docs/functions/spark/datetime.rst Outdated
Comment thread velox/functions/lib/TimeUtils.h Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove those defined in DateTimeFormatter then? And +1 for using string directly because these variables are used only once.

Comment thread velox/functions/lib/TimeUtils.h Outdated
Comment thread velox/functions/lib/TimeUtils.h Outdated
Comment thread velox/functions/lib/TimeUtils.h Outdated
Comment thread velox/functions/lib/TimeUtils.h Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add const when passing by value.

Comment thread velox/functions/lib/TimeUtils.h Outdated
Comment thread velox/functions/lib/TimeUtils.h Outdated
Comment thread velox/functions/sparksql/Register.cpp Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved, thanks.

Comment thread velox/docs/functions/spark/datetime.rst Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if ts is null?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return null.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread velox/docs/functions/spark/datetime.rst Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the function description before argument description L56.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is valid too.

 ASSERT_EQ(
      DateTimeUnit::kMillisecond, fromDateTimeUnitString("millisecond", false));

And other public functions also need tests. I'm open for this point since the function test should have cover the function you moved, but it is better to add it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test invalid fmt by VELOX_ASSERT_THROW

Copy link
Copy Markdown
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added some comments.

Comment thread velox/docs/functions/spark/datetime.rst Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example section does not render correctly. You could add the '::' as below to fix it.

       * "MICROSECOND" - everything remains.
        
    ::

        SELECT date_trunc('YEAR', '2015-03-05T09:32:05.359'); -- 2015-01-01 00:00:00

Comment thread velox/docs/functions/spark/datetime.rst Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The microseconds are not zeroed out. Is this example correct?

Comment thread velox/docs/functions/spark/datetime.rst Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate for millisecond.

Comment thread velox/functions/lib/DateTimeFormatter.h Outdated
Comment thread velox/functions/lib/TimeUtils.h Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Mirco -> Micro

Comment thread velox/functions/lib/TimeUtils.h Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: trunced -> truncated

Comment thread velox/functions/lib/TimeUtils.h Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid repeating the code in comments. Comment suggestion:

/// Returns timestamp with seconds adjusted to the nearest lower multiple of the
/// specified interval. If the given seconds is negative and not an exact
/// multiple of the interval, it adjusts further down.

Comment thread velox/functions/lib/TimeUtils.h Outdated
Comment thread velox/functions/lib/TimeUtils.h Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: format is not used either.

@zml1206 zml1206 requested a review from rui-mo November 15, 2024 01:40
@stale
Copy link
Copy Markdown

stale Bot commented Feb 14, 2025

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

@stale stale Bot added the stale label Feb 14, 2025
@rui-mo
Copy link
Copy Markdown
Contributor

rui-mo commented Feb 14, 2025

@zml1206 Would you like to rebase this PR? Thanks.

@stale stale Bot removed the stale label Feb 14, 2025
@zml1206 zml1206 changed the title Add Spark date_trunc function feat: Add Spark date_trunc function Feb 15, 2025
@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Feb 17, 2025

@zml1206 Would you like to rebase this PR? Thanks.

Rebased, CI failures seems unrelated. @rui-mo

Copy link
Copy Markdown
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good overall.

Extend funciton fromDateTimeUnitString

Would be elaborate how the function is extended in the PR description? Thanks.

Comment thread velox/functions/lib/TimeUtils.h Outdated
Comment thread velox/functions/lib/TimeUtils.h Outdated
Comment thread velox/functions/sparksql/DateTimeFunctions.h Outdated
Comment thread velox/functions/sparksql/DateTimeFunctions.h Outdated
Comment thread velox/functions/lib/TimeUtils.h Outdated
@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Feb 27, 2025

@rui-mo all comments resolved, can you help take a look again? thanks.

Copy link
Copy Markdown
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks nice! Just added some nits.

Comment thread velox/functions/lib/TimeUtils.h Outdated
Comment thread velox/functions/sparksql/DateTimeFunctions.h Outdated
Comment thread velox/functions/sparksql/DateTimeFunctions.h Outdated
Comment thread velox/docs/functions/spark/datetime.rst Outdated
Copy link
Copy Markdown
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Just several nits.

Comment thread velox/docs/functions/spark/datetime.rst Outdated
Comment thread velox/docs/functions/spark/datetime.rst Outdated
Comment thread velox/docs/functions/spark/datetime.rst Outdated
Comment thread velox/functions/lib/TimeUtils.cpp Outdated
Comment thread velox/functions/lib/TimeUtils.cpp Outdated
Comment thread velox/functions/sparksql/DateTimeFunctions.h Outdated
Comment thread velox/docs/functions/spark/datetime.rst Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps add this case in the test.

@rui-mo
Copy link
Copy Markdown
Contributor

rui-mo commented Mar 13, 2025

@Yuhta @pedroerp This PR introduces some refactor in the 'functions/lib' and 'functions/presto', would you like to take a look? If no further comment, we can mark it as ready-to-merge. Thanks!

Copy link
Copy Markdown
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rebase this PR? Thanks.

Copy link
Copy Markdown
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run the fuzzer test for 2 minutes to make sure it works? You could use --only to specify this function.

Command example: https://github.com/facebookincubator/velox/blob/main/.github/workflows/scheduled.yml#L587-L596

zml1206 added 7 commits March 31, 2025 16:25
fix style

fix

fix

fix

fix

fix

fix

update for reuse

fix style

fix

fix

fix

fix

fix style

update

update

fix

fix

update

update

fix

fix

update

update

update

update

add ut

add ut
@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Mar 31, 2025

Can you run the fuzzer test for 2 minutes to make sure it works? You could use --only to specify this function.

Command example: https://github.com/facebookincubator/velox/blob/main/.github/workflows/scheduled.yml#L587-L596

image

@rui-mo rui-mo added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Mar 31, 2025
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@bikramSingh91 merged this pull request in f14efd0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants