feat: Add Spark make_dt_interval function#12763
feat: Add Spark make_dt_interval function#12763ArnavBalyan wants to merge 3 commits intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
There was a problem hiding this comment.
Could you please add the link of corresponding Spark's implementation to the PR description? And also remember to update the doc.
There was a problem hiding this comment.
There was a problem hiding this comment.
I think https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala#L391 is the Spark's implementation for this function.
|
Addressed the comments |
There was a problem hiding this comment.
It looks like the fourth parameter is of short decimal type, so the input type should be int64_t, right?
There was a problem hiding this comment.
I think with int64 we lose the decimal information, spark expects double so need to keep the registration as shortdecimal and double here to preserve the timestamp values
rui-mo
left a comment
There was a problem hiding this comment.
Please also remember to update the document. Thanks.
ef89d54 to
072db35
Compare
updated |
There was a problem hiding this comment.
Would you add a test for the bridge change?
There was a problem hiding this comment.
Overflow can already happen during one of these operations before the check. Perhaps use safe arithmetic functions including __builtin_mul_overflow and __builtin_add_overflow to detect overflow during calculations.
There was a problem hiding this comment.
Consider adding a separate file for this implementation as the DateTimeFunctions.h is large.
072db35 to
e3b4d55
Compare
cd2fe35 to
da3f65c
Compare
|
cc @rui-mo could you please run this test also thanks |
rui-mo
left a comment
There was a problem hiding this comment.
Added some comments. Thanks.
| SELECT make_ym_interval(2); -- 2-0 | ||
| SELECT make_ym_interval(); -- 0-0 | ||
|
|
||
| .. spark:function:: make_dt_interval([days[, hours[, minutes[, seconds]]]]) -> interval day to second |
There was a problem hiding this comment.
Please follow the alphabetical order to add documents, thanks.
| IntervalDayTime, | ||
| int32_t, | ||
| int32_t, | ||
| int32_t>({prefix + "make_dt_interval"}); |
There was a problem hiding this comment.
Spark would provide default values for the optional variables, so we might only need to register the four-arg one.
| SELECT make_ym_interval(2); -- 2-0 | ||
| SELECT make_ym_interval(); -- 0-0 | ||
|
|
||
| .. spark:function:: make_dt_interval([days[, hours[, minutes[, seconds]]]]) -> interval day to second |
There was a problem hiding this comment.
ditto for the document of optional variables.
|
|
||
| Make day-time interval from ``days``, ``hours``, ``minutes`` and ``seconds`` fields. | ||
| All parameters can be zero, positive or negative. | ||
| Throws an error when inputs lead to int overflow. |
There was a problem hiding this comment.
I assume not all parameters are integer, perhaps clarify the parameter types and the overflow behavior.
|
|
||
| namespace facebook::velox::functions::sparksql { | ||
|
|
||
| inline int64_t safeMul(int64_t a, int64_t b, const char* context) { |
There was a problem hiding this comment.
There is the checkedMultiply function, can we reuse it?
| return result; | ||
| } | ||
|
|
||
| inline int64_t safeAdd(int64_t a, int64_t b, const char* context) { |
| @@ -0,0 +1,83 @@ | |||
| /* | |||
There was a problem hiding this comment.
Perhaps rename this header as MakeDTInterval. It is recommended to add a separate file for each function.
| } | ||
|
|
||
| template <typename T> | ||
| struct MakeDTIntervalFunction { |
There was a problem hiding this comment.
nit: perhaps add some comments for this struct.
| int32_t, | ||
| int32_t, | ||
| int32_t, | ||
| ShortDecimal<P1, S1>>({prefix + "make_dt_interval"}); |
There was a problem hiding this comment.
The decimal type seems to be fixed in Spark and is decimal(18, 6).
| int32_t days = 0, | ||
| int32_t hours = 0, | ||
| int32_t minutes = 0, | ||
| double secondsMicros = 0) { |
There was a problem hiding this comment.
I assume the fourth argument is the unscaled value of short decimal type with scale 6, and its C++ type is int64_t.
|
This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions! |
https://github.com/apache/spark/blob/branch-3.5/sql/api/src/main/scala/org/apache/spark/sql/types/DayTimeIntervalType.scala#L70