Skip to content

feat: Add Spark make_dt_interval function#12763

Closed
ArnavBalyan wants to merge 3 commits intofacebookincubator:mainfrom
ArnavBalyan:arnavb/v-mk-dt-support
Closed

feat: Add Spark make_dt_interval function#12763
ArnavBalyan wants to merge 3 commits intofacebookincubator:mainfrom
ArnavBalyan:arnavb/v-mk-dt-support

Conversation

@ArnavBalyan
Copy link
Copy Markdown
Contributor

@ArnavBalyan ArnavBalyan commented Mar 23, 2025

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 23, 2025
@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 23, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit ee426dc
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6808c1894b521b0008ce4dfd

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add the link of corresponding Spark's implementation to the PR description? And also remember to update the doc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo rui-mo changed the title feat: Add support for day time interval for Spark feat: Add Spark make_dt_interval function Mar 25, 2025
@ArnavBalyan
Copy link
Copy Markdown
Contributor Author

Addressed the comments

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the fourth parameter is of short decimal type, so the input type should be int64_t, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with int64 we lose the decimal information, spark expects double so need to keep the registration as shortdecimal and double here to preserve the timestamp values

Comment thread velox/functions/sparksql/DateTimeFunctions.h Outdated
Comment thread velox/functions/sparksql/DateTimeFunctions.h Outdated
Comment thread velox/functions/sparksql/DateTimeFunctions.h Outdated
Copy link
Copy Markdown
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also remember to update the document. Thanks.

@ArnavBalyan ArnavBalyan force-pushed the arnavb/v-mk-dt-support branch from ef89d54 to 072db35 Compare April 13, 2025 14:43
@ArnavBalyan
Copy link
Copy Markdown
Contributor Author

Please also remember to update the document. Thanks.

updated

Comment thread velox/vector/arrow/Bridge.cpp Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you add a test for the bridge change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overflow can already happen during one of these operations before the check. Perhaps use safe arithmetic functions including __builtin_mul_overflow and __builtin_add_overflow to detect overflow during calculations.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a separate file for this implementation as the DateTimeFunctions.h is large.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ArnavBalyan ArnavBalyan force-pushed the arnavb/v-mk-dt-support branch from 072db35 to e3b4d55 Compare April 17, 2025 11:07
@ArnavBalyan ArnavBalyan force-pushed the arnavb/v-mk-dt-support branch from cd2fe35 to da3f65c Compare April 17, 2025 11:30
@ArnavBalyan
Copy link
Copy Markdown
Contributor Author

cc @rui-mo could you please run this test also thanks

Copy link
Copy Markdown
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments. Thanks.

SELECT make_ym_interval(2); -- 2-0
SELECT make_ym_interval(); -- 0-0

.. spark:function:: make_dt_interval([days[, hours[, minutes[, seconds]]]]) -> interval day to second
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the alphabetical order to add documents, thanks.

IntervalDayTime,
int32_t,
int32_t,
int32_t>({prefix + "make_dt_interval"});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark would provide default values for the optional variables, so we might only need to register the four-arg one.

SELECT make_ym_interval(2); -- 2-0
SELECT make_ym_interval(); -- 0-0

.. spark:function:: make_dt_interval([days[, hours[, minutes[, seconds]]]]) -> interval day to second
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto for the document of optional variables.


Make day-time interval from ``days``, ``hours``, ``minutes`` and ``seconds`` fields.
All parameters can be zero, positive or negative.
Throws an error when inputs lead to int overflow.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume not all parameters are integer, perhaps clarify the parameter types and the overflow behavior.


namespace facebook::velox::functions::sparksql {

inline int64_t safeMul(int64_t a, int64_t b, const char* context) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the checkedMultiply function, can we reuse it?

return result;
}

inline int64_t safeAdd(int64_t a, int64_t b, const char* context) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse checkedPlus?

@@ -0,0 +1,83 @@
/*
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps rename this header as MakeDTInterval. It is recommended to add a separate file for each function.

}

template <typename T>
struct MakeDTIntervalFunction {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps add some comments for this struct.

int32_t,
int32_t,
int32_t,
ShortDecimal<P1, S1>>({prefix + "make_dt_interval"});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decimal type seems to be fixed in Spark and is decimal(18, 6).

int32_t days = 0,
int32_t hours = 0,
int32_t minutes = 0,
double secondsMicros = 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the fourth argument is the unscaled value of short decimal type with scale 6, and its C++ type is int64_t.

@stale
Copy link
Copy Markdown

stale Bot commented Jul 28, 2025

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants