-
Notifications
You must be signed in to change notification settings - Fork 270
feat: Implement Spark-compatible CAST from non-integral numeric types to integral types #399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you @rohitrastogi this looks really great! Could you also run |
|
I ran |
andygrove
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rohitrastogi for this contribution! LGTM.
| .iter() | ||
| .map(|value| match value { | ||
| Some(value) => { | ||
| let is_overflow = value.is_nan() || value.abs() as i32 == std::i32::MAX; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andygrove This condition is actually incorrect due to how it it handles INT MIN.
Should be something like:
let is_overflow = value.is_nan() || (value as f64).floor() > (std::i32::MAX as f64) || (value as f64).ceil() < (std::i32::MIN as f64);
This is what Scala does in FloatExactNumeric.
Working on a fix with some improved tests. It looks like there are some tedious edge cases on how Java/Scala format the error strings depending on how large the float is.
Rust and Scala format the same float as decimals with different precisions when printing, which makes it challenging to get the same error output as Spark in ANSI mode. Not sure how to address that - we may need to relax the exact string match criteria for float -> int conversions and warn users that though the error checking logic from vanilla Spark/Comet are the same, the error messages are different.
For example:
For INT_MAX, Rust prints 2.1474836E9, whereas Java prints 2.14748365E9. Both printouts correspond to the same float 2147483648.
| .map(|value| match value { | ||
| Some(value) => { | ||
| let is_overflow = | ||
| value.is_nan() || value.abs() as $rust_dest_type == $max_dest_val; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also wrong.
Should be something like:
let is_overflow = value.is_nan() || (value as f64).floor() > ($max_dest_val as f64) || (value as f64).ceil() < ($min_dest_val as f64);
| Some(value) => { | ||
| let divisor = 10_i128.pow($scale as u32); | ||
| let (truncated, decimal) = (value / divisor, (value % divisor).abs()); | ||
| let is_overflow = truncated.abs() > $max_dest_val.into(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also wrong.
Should be:
let is_overflow = truncated > $max_dest_val.into() || truncated < $min_dest_val.into();
| Some(value) => { | ||
| let divisor = 10_i128.pow($scale as u32); | ||
| let (truncated, decimal) = (value / divisor, (value % divisor).abs()); | ||
| let is_overflow = truncated.abs() > std::i32::MAX.into(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also wrong.
Should be:
let is_overflow = truncated > std::i32::MAX.into() || truncated < std::i32::MIN.into();
… to integral types (apache#399) * WIP - float to int, sketchy * WIP - extremely ugly but functional * WIP - use macro * simply further * delete dead code * make format * progress on decimals * refactor * format decimal value in overflow exception * wip - have to use 4 macros, need more decimal tests * ready for review * forgot to commit whoops * bad merge * address pr comments * commit missed compatibility * improve error message * improve error message again * revert perf reression in cast_int_to_int_macro * remove branching in loop for legacy case --------- Co-authored-by: Rohit Rastogi <rohitrastogi@Rohits-MacBook-Pro.local>
Which issue does this PR close?
Closes #350
Rationale for this change
Improve compatibility with Spark
What changes are included in this PR?
How are these changes tested?
CometCastSuite tests pass.