Decimal multiply kernel should not cause precision loss #5675

viirya · 2023-03-21T21:03:59Z

Which issue does this PR close?

Closes #5674.

Rationale for this change

Currently decimal multiplication in DataFusion silently truncates precision of result. It happens generally for regular decimal multiplication which doesn't overflow. Looks like DataFusion uses incomplete decimal precision coercion rule from Spark to coerce sides of decimal multiplication (and other arithmetic operators). The coerced type on two sides of decimal multiplication is not the resulting decimal type of multiplication. This (and how we computes decimal multiplication in the kernels) leads to truncated precision in the result decimal type.

What changes are included in this PR?

This patch updates type coercion rule for decimal arithmetic operators. To prevent the type coercion rule applied on sides of decimal operators, we need one expression node to wrap coerced sides.

Are these changes tested?

Are there any user-facing changes?

viirya · 2023-03-23T16:11:09Z

There is more broader issue on type coercion. I'm working on this and will update.

viirya · 2023-03-27T01:48:57Z

rust-decimal only supports precision <= 28...

viirya · 2023-03-27T05:25:15Z

datafusion/core/tests/sqllogictests/test_files/tpch.slt

    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
-    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
+    sum(cast(l_extendedprice as decimal(12,2)) * (1 - l_discount) * (1 + l_tax)) as sum_charge,


This decimal calculation actually causes overflow on decimal multiplication because the kernel DataFusion uses doesn't allow precision loss. Previously it accidentally works because we implicitly truncate all input decimal value before multiplying...

So I add a cast to get the test result unchanged. To get rid of this cast later, we can use new multiply kernel at arrow-rs which allows precision-loss.

perhaps we can add a comment here referencing the new arrow-rs kernel / work. Otherwise this context will likely get forgotten

viirya · 2023-03-27T05:26:39Z

datafusion/core/tests/sqllogictests/src/engines/conversion.rs

-pub fn i128_to_str(value: i128, scale: u32) -> String {
+pub fn i128_to_str(value: i128, precision: &u8, scale: &i8) -> String {
    big_decimal_to_str(
-        BigDecimal::from_str(&Decimal::from_i128_with_scale(value, scale).to_string())


rust-decimal doesn't allow precision > 28. We can simply use Decimal128Type from arrow-rs to get string of decimal.

viirya · 2023-03-30T08:13:27Z

benchmarks/queries/q8.sql

+    cast(cast(sum(case
            when nation = 'BRAZIL' then volume
            else 0
-        end) / sum(volume) as mkt_share
+        end) as decimal(12,2)) / cast(sum(volume) as decimal(12,2)) as decimal(15,2)) as mkt_share


Current decimal divide kernel fails to compute this. The result decimal type is 38 scale. While divide_dyn_opt_decimal multiplies left array with 10.pow(scale), obviously it overflows because the multiply kernel doesn't allow precision-loss multiplication.

With inner casts wrapping two sum, we can use a result scale which won't cause overflow. The result decimal is correct.

But the answer file is fixed as decimal(15, 2) so the result still fails the check. I only can add another cast to make it as same decimal type to pass this result check.

viirya · 2023-03-30T18:36:37Z

More changes than I expected, but finally making DataFusion all tests happy...

alamb · 2023-03-30T18:40:22Z

I'll try and find time to review this PR tomorrow

alamb · 2023-03-30T18:40:29Z

cc @liukun4515

alamb

Thanks @viirya

I took a look through this code -- the coercion rule changes make sense to me. However, I don't undertand the need for Expr::PromotePrecision or the new data_type field on Expr. They don't seem to be related to improving the coercion for decimals

alamb · 2023-04-03T13:23:01Z

datafusion/expr/src/expr.rs

    }
 }

+/// Cast expression


Can we please add documentation about what PromotePrecision does and in what circumstances it is needed?

alamb · 2023-04-03T13:24:38Z

datafusion/core/tests/sqllogictests/test_files/tpch.slt

    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
-    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
+    sum(cast(l_extendedprice as decimal(12,2)) * (1 - l_discount) * (1 + l_tax)) as sum_charge,


perhaps we can add a comment here referencing the new arrow-rs kernel / work. Otherwise this context will likely get forgotten

alamb · 2023-04-03T13:26:33Z

datafusion/expr/src/expr_schema.rs

    /// cast to a type with respect to a schema
    fn cast_to<S: ExprSchema>(self, cast_to_type: &DataType, schema: &S) -> Result<Expr>;
+
+    /// promote to a type with respect to a schema


Can you please add some comments to clarify what the difference between "promote" and "cast" is?

alamb · 2023-04-03T13:28:41Z

datafusion/expr/src/expr.rs

    pub op: Operator,
    /// Right-hand side of the expression
    pub right: Box<Expr>,
+    /// The data type of the expression, if known


I don't understand the reason for this change -- is it no longer possible to calculate the output type of a BinaryExpr from its inputs?

If the desired output type is different then couldn't we cast the expr to the type?

viirya · 2023-04-03T18:04:33Z

the coercion rule changes make sense to me. However, I don't undertand the need for Expr::PromotePrecision or the new data_type field on Expr. They don't seem to be related to improving the coercion for decimals

The coercion rule that modifies sides of arithmetic op is not idempotent. Multiple runs of the rule will change it to incorrect result. So we need something to prevent the rule on coerced sides. PromotePrecision is such a thing, it's just a wrapper for the purpose.

For the new data_type on BinaryExpr. The coerced type of decimal arithmetic op is not the same as the result type of it as you can see. So we cannot simply take coerced type of left/right sides and use it as result type. We cannot compute the result type on-the-fly in physical BinaryExec because it depends on original datatypes of sides of the op, but we only have coerced at the moment. So we need to record the result type so we can get it when computing the decimal arithmetic result.

alamb · 2023-04-04T16:59:01Z

The coercion rule that modifies sides of arithmetic op is not idempotent. Multiple runs of the rule will change it to incorrect result. So we need something to prevent the rule on coerced sides. PromotePrecision is such a thing, it's just a wrapper for the purpose.

For the new data_type on BinaryExpr. The coerced type of decimal arithmetic op is not the same as the result type of it as you can see. So we cannot simply take coerced type of left/right sides and use it as result type. We cannot compute the result type on-the-fly in physical BinaryExec because it depends on original datatypes of sides of the op, but we only have coerced at the moment. So we need to record the result type so we can get it when computing the decimal arithmetic result.

This makes sense -- thank you

My biggest concern with the approach in this PR is that it adds some special case for decimal (which seems reasonable in itself) but that special case bleeds out over the rest of the code / plans, which is probably why it got so big.

I wonder would it be possible to use PromotePrecision only and avoid modifying Expr::Binary to keep the behavior more localized to coercion.

Perhaps something like the following.

/// The target decimal type the expression should be output to
struct PromotePrecision {
  expr: Box<Expr>,
  precision: u8,
  scale: u8,
}

One of the downsides of this approach is that PromotePrecision is now clearly special purpose for Decimal. I am not sure if it would have other uses in the future 🤔

viirya · 2023-04-04T20:03:22Z

Keeping precision/scale in PromotePrecision seems okay. I don't think PromotePrecision will be used for other uses. It is a specified wrapper (based on its name) for decimal arithmetic type coercion only.

alamb · 2023-04-04T20:27:10Z

Keeping precision/scale in PromotePrecision seems okay. I don't think PromotePrecision will be used for other uses. It is a specified wrapper (based on its name) for decimal arithmetic type coercion only.

I think then it would be best to keep the data type in Expr::PromotePrecision rather than adding it to BinaryExpr

liukun4515 · 2023-04-05T13:45:22Z

cc @liukun4515

Thanks for your mention

I will take a look it carefully tomorrow.

@jackwener has moved the type coercion to the analysis phase.

Maybe we have some same comments about the rule of type coercion.

viirya · 2023-04-05T18:37:13Z

Yea, seems I need to rebase this.

mingmwang · 2023-04-07T11:42:57Z

Does this new PromotePrecision expr only apply to Decimal type?
I remember in the SparkSQL there was a similar PromotePrecision, not sure whether it is for the same purpose.
And looks like in the latest Spark, it is removed or renamed.

viirya · 2023-04-07T18:08:16Z

Does this new PromotePrecision expr only apply to Decimal type?
I remember in the SparkSQL there was a similar PromotePrecision, not sure whether it is for the same purpose.
And looks like in the latest Spark, it is removed or renamed.

It is for Decimal type only. Yea, the idea is from Spark.

It was removed in latest Spark as the precision promotion is moved to where the computation happens. I was not sure if moving it into decimal kernels is a good idea for DataFusion. Let me re-think about it and maybe refactor this.

alamb · 2023-04-11T18:01:53Z

marking as draft while it gets worked on

viirya · 2023-04-11T18:18:43Z

Thanks @alamb. I'll find some time on this. :)

viirya · 2023-04-12T20:02:02Z

This new approach is proposed at #5980.

github-actions bot added the physical-expr Changes to the physical-expr crates label Mar 21, 2023

viirya marked this pull request as draft March 21, 2023 21:42

github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate labels Mar 24, 2023

viirya force-pushed the fix_decimal_multiply_precision_loss branch 2 times, most recently from e5e3bcb to b108159 Compare March 24, 2023 06:25

github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Mar 24, 2023

viirya force-pushed the fix_decimal_multiply_precision_loss branch from b108159 to 9ba3285 Compare March 24, 2023 07:19

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 27, 2023

viirya commented Mar 27, 2023

View reviewed changes

viirya force-pushed the fix_decimal_multiply_precision_loss branch from 32e5dfc to e727f5a Compare March 29, 2023 05:34

viirya commented Mar 30, 2023

View reviewed changes

viirya added 11 commits March 30, 2023 08:38

Decimal multiply kernel should not cause precision loss

9dcb56c

Fix

d0f4924

Fix

3769b51

Fix

97dd4ea

Fix

03f6c72

Fix

333d987

Fix

373c28e

For new tree node api

276e8b7

Fix expected plans

15d4242

fix

c6fbea1

Fix q8_expected_plan

8012386

viirya added 6 commits March 30, 2023 08:39

More

bc2e7cf

Fix test

81cc07b

Trigger Build

036cfda

Update proto

9db0f93

fix

eda37ed

fix

fb9a27a

viirya force-pushed the fix_decimal_multiply_precision_loss branch from b1292ba to fb9a27a Compare March 30, 2023 16:04

viirya marked this pull request as ready for review March 30, 2023 17:25

alamb reviewed Apr 3, 2023

View reviewed changes

alamb mentioned this pull request Apr 6, 2023

Improve avg/sum Aggregator performance for Decimal #5866

Merged

alamb marked this pull request as draft April 11, 2023 18:01

viirya mentioned this pull request Apr 12, 2023

Decimal multiply kernel should not cause precision loss #5980

Merged

viirya closed this Apr 20, 2023

Decimal multiply kernel should not cause precision loss #5675

Decimal multiply kernel should not cause precision loss #5675

Uh oh!

Conversation

viirya commented Mar 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

viirya commented Mar 23, 2023

Uh oh!

viirya commented Mar 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya Mar 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya commented Mar 30, 2023

Uh oh!

alamb commented Mar 30, 2023

Uh oh!

alamb commented Mar 30, 2023

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya commented Apr 3, 2023

Uh oh!

alamb commented Apr 4, 2023

Uh oh!

viirya commented Apr 4, 2023

Uh oh!

alamb commented Apr 4, 2023

Uh oh!

liukun4515 commented Apr 5, 2023

Uh oh!

viirya commented Apr 5, 2023

Uh oh!

mingmwang commented Apr 7, 2023

Uh oh!

viirya commented Apr 7, 2023

Uh oh!

alamb commented Apr 11, 2023

Uh oh!

viirya commented Apr 11, 2023

Uh oh!

viirya commented Apr 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

viirya commented Mar 21, 2023 •

edited

Loading

viirya commented Mar 27, 2023 •

edited

Loading

viirya Mar 27, 2023 •

edited

Loading