Skip to content

Conversation

@rohitrastogi
Copy link
Contributor

@rohitrastogi rohitrastogi commented May 7, 2024

Which issue does this PR close?

Closes #350

Rationale for this change

Improve compatibility with Spark

What changes are included in this PR?

  1. Functionality to "Spark compatible cast" floats, doubles, and decimals to bytes, shorts, ints, and longs
  2. Enhanced decimal tests to check for cast edge cases

How are these changes tested?

CometCastSuite tests pass.

@rohitrastogi rohitrastogi marked this pull request as draft May 7, 2024 20:44
@rohitrastogi rohitrastogi marked this pull request as ready for review May 7, 2024 21:18
@andygrove
Copy link
Member

Thank you @rohitrastogi this looks really great! Could you also run mvn package -DskipTests to regenerate compatibility.md file to include the newly supported cast expressions?

@rohitrastogi
Copy link
Contributor Author

I ran mvn package -DskipTests to regenerate compatibility.md.

@rohitrastogi rohitrastogi requested a review from andygrove May 8, 2024 05:12
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rohitrastogi for this contribution! LGTM.

@andygrove andygrove merged commit c261af3 into apache:main May 8, 2024
.iter()
.map(|value| match value {
Some(value) => {
let is_overflow = value.is_nan() || value.abs() as i32 == std::i32::MAX;
Copy link
Contributor Author

@rohitrastogi rohitrastogi May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andygrove This condition is actually incorrect due to how it it handles INT MIN.
Should be something like:

let is_overflow = value.is_nan() || (value as f64).floor()  > (std::i32::MAX as f64)  || (value as f64).ceil() < (std::i32::MIN as f64);

This is what Scala does in FloatExactNumeric.

Working on a fix with some improved tests. It looks like there are some tedious edge cases on how Java/Scala format the error strings depending on how large the float is.

Rust and Scala format the same float as decimals with different precisions when printing, which makes it challenging to get the same error output as Spark in ANSI mode. Not sure how to address that - we may need to relax the exact string match criteria for float -> int conversions and warn users that though the error checking logic from vanilla Spark/Comet are the same, the error messages are different.

For example:
For INT_MAX, Rust prints 2.1474836E9, whereas Java prints 2.14748365E9. Both printouts correspond to the same float 2147483648.

.map(|value| match value {
Some(value) => {
let is_overflow =
value.is_nan() || value.abs() as $rust_dest_type == $max_dest_val;
Copy link
Contributor Author

@rohitrastogi rohitrastogi May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also wrong.

Should be something like:

let is_overflow = value.is_nan() || (value as f64).floor()  > ($max_dest_val as f64) || (value as f64).ceil() < ($min_dest_val as f64);

Some(value) => {
let divisor = 10_i128.pow($scale as u32);
let (truncated, decimal) = (value / divisor, (value % divisor).abs());
let is_overflow = truncated.abs() > $max_dest_val.into();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wrong.

Should be:

let is_overflow = truncated > $max_dest_val.into() || truncated < $min_dest_val.into();

Some(value) => {
let divisor = 10_i128.pow($scale as u32);
let (truncated, decimal) = (value / divisor, (value % divisor).abs());
let is_overflow = truncated.abs() > std::i32::MAX.into();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wrong.

Should be:

let is_overflow = truncated > std::i32::MAX.into() || truncated < std::i32::MIN.into();

himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
… to integral types (apache#399)

* WIP - float to int, sketchy

* WIP - extremely ugly but functional

* WIP - use macro

* simply further

* delete dead code

* make format

* progress on decimals

* refactor

* format decimal value in overflow exception

* wip - have to use 4 macros, need more decimal tests

* ready for review

* forgot to commit whoops

* bad merge

* address pr comments

* commit missed compatibility

* improve error message

* improve error message again

* revert perf reression in cast_int_to_int_macro

* remove branching in loop for legacy case

---------

Co-authored-by: Rohit Rastogi <rohitrastogi@Rohits-MacBook-Pro.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Spark-compatible CAST from float/double to integer types

2 participants