Skip to content

Support overflow in Timestamp::toTimeZone method#10641

Closed
JkSelf wants to merge 1 commit intofacebookincubator:mainfrom
JkSelf:overflow
Closed

Support overflow in Timestamp::toTimeZone method#10641
JkSelf wants to merge 1 commit intofacebookincubator:mainfrom
JkSelf:overflow

Conversation

@JkSelf
Copy link
Copy Markdown
Contributor

@JkSelf JkSelf commented Aug 1, 2024

After merging the PR#10563, we encountered an error during the unit tests in Gluten.

Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: Timepoint is outside of supported year range: [-32767, 32767], got 1587287
Retriable: False
Context: Top-level Expression: from_unixtime(9223372036854775807:BIGINT, yyyy-MM-dd HH:mm:ss.SSS:VARCHAR)
Function: validateRangeImpl
File: /mnt/DP_disk3/jk/projects/gluten/ep/build-velox/build/velox_ep/velox/type/tz/TimeZoneMap.cpp
Line: 187
Stack trace:
# 0  facebook::velox::VeloxException::VeloxException(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)
# 1  void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxUserError, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(facebook::velox::detail::VeloxCheckFailArgs const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 2  facebook::velox::tz::validateRange(std::chrono::time_point<std::chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1l> > >)
# 3  facebook::velox::tz::TimeZone::to_local(std::chrono::duration<long, std::ratio<1l, 1l> >) const
# 4  facebook::velox::Timestamp::toTimezone(facebook::velox::tz::TimeZone const&)
# 5  facebook::velox::functions::DateTimeFormatter::format(facebook::velox::Timestamp const&, facebook::velox::tz::TimeZone const*, unsigned int, char*, bool) const
# 6  _ZNK8facebook5velox17SelectivityVector15applyToSelectedIZNS0_4exec7EvalCtx22applyToSelectedNoThrowIZNKS3_21SimpleFunctionAdapterINS0_4core9UDFHolderINS0_9functions8sparksql20FromUnixtimeFunctionINS3_10VectorExecEEESC_NS0_7VarcharENS0_15ConstantCheckerIJlSE_EEEJlSE_EEEE8applyUdfIZNKSI_7iterateIJNS3_20ConstantVectorReaderIlEENSL_ISE_EEEEEvRNSI_12ApplyContextEDpRT_EUlRT_RT0_T1_E1_EEvSP_ST_EUlST_E_ZNKSJ_ISY_EEvSP_ST_EUlST_E0_EEvRKS1_ST_SV_EUlST_E_EEvST_
# 7  facebook::velox::exec::SimpleFunctionAdapter<facebook::velox::core::UDFHolder<facebook::velox::functions::sparksql::FromUnixtimeFunction<facebook::velox::exec::VectorExec>, facebook::velox::exec::VectorExec, facebook::velox::Varchar, facebook::velox::ConstantChecker<long, facebook::velox::Varchar>, long, facebook::velox::Varchar> >::apply(facebook::velox::SelectivityVector const&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&, std::shared_ptr<facebook::velox::Type const> const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&) const
# 8  facebook::velox::exec::Expr::applyFunction(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
# 9  facebook::velox::exec::Expr::applyFunctionWithPeeling(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
# 10 facebook::velox::exec::Expr::evalAllImpl(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
# 11 facebook::velox::exec::Expr::evalAll(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
# 12 facebook::velox::exec::Expr::evalWithNulls(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
# 13 facebook::velox::exec::Expr::evalEncodings(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
# 14 facebook::velox::exec::Expr::eval(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&, facebook::velox::exec::ExprSet const*)
# 15 facebook::velox::exec::ExprSet::eval(int, int, bool, facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&)
# 16 facebook::velox::exec::FilterProject::project(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&)
# 17 facebook::velox::exec::FilterProject::getOutput()
# 18 facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>&, std::shared_ptr<facebook::velox::RowVector>&)
# 19 facebook::velox::exec::Driver::next(std::shared_ptr<facebook::velox::exec::BlockingState>&)
# 20 facebook::velox::exec::Task::next(folly::SemiFuture<folly::Unit>*)

The reason is that this PR#10563 removed the handling for overflow. We have added handling for overflow in the toTimeZone method in this PR.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 1, 2024
@netlify
Copy link
Copy Markdown

netlify Bot commented Aug 1, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 3f7608e
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66ab3e73c110f8000832d2cb

@JkSelf
Copy link
Copy Markdown
Contributor Author

JkSelf commented Aug 1, 2024

@pedroerp @rui-mo @jinchengchenghh Can you help to review? Thanks.

@mbasmanova mbasmanova requested a review from pedroerp August 1, 2024 08:35
// 8 hours ahead UTC.
setQueryTimeZone("Asia/Shanghai");

#ifdef NDEBUG
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max value of seconds allowed in Spark is INT64_MAX/1'000'000, so this limitation could be removed if we avoid creating invalid timestamp from the unix_timestamp function, seeing #9904. To use INT64_MAX/1'000'000 as seconds, below overflow should also be reproducible.

Timepoint is outside of supported year range: [-32767, 32767]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. It would be great if there was a way to ensure this from the Spark functions themselves, since the Timestamp class is shared with Presto and others.

Copy link
Copy Markdown
Contributor

@pedroerp pedroerp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about changing the behavior here. Asked a few questions in the PR

Comment thread velox/type/Timestamp.cpp
void Timestamp::toTimezone(const tz::TimeZone& zone, bool allowOverflow) {
try {
seconds_ = zone.to_local(std::chrono::seconds(seconds_)).count();
auto tp = toTimePointMs(allowOverflow);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to create a time point here just to discard it later? The same validateRange() method will be called inside to_local()

// 8 hours ahead UTC.
setQueryTimeZone("Asia/Shanghai");

#ifdef NDEBUG
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. It would be great if there was a way to ensure this from the Spark functions themselves, since the Timestamp class is shared with Presto and others.

// 8 hours ahead UTC.
setQueryTimeZone("Asia/Shanghai");

#ifdef NDEBUG
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's sort of weird that we only let it overflow in release code and add a test to ensure it. The lack of test in release wasn't intentional, but due to performance to prevent the checking overhead. How could we make this more general?

@stale
Copy link
Copy Markdown

stale Bot commented Nov 4, 2024

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants