Skip to content

Conversation

@cyb70289
Copy link
Contributor

@cyb70289 cyb70289 commented May 20, 2021

Add basic binary arithmetic (+,-,*,/) kernels for decimal types.

@github-actions
Copy link

@cyb70289 cyb70289 marked this pull request as draft May 20, 2021 09:31
@cyb70289 cyb70289 marked this pull request as ready for review May 27, 2021 10:29
@bkietz bkietz self-requested a review May 27, 2021 10:51
Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

Please extract the decimal upscaling from the addition kernel into an implicit cast. This will simplify the addition kernel to stateless addition (IIUC) and give callers control over how to handle upscaling. See CommonNumeric() for existing promotion behavior; this should give you an idea of how to specify the upscaling as an implicit cast.

@cyb70289
Copy link
Contributor Author

cyb70289 commented Jun 1, 2021

Please extract the decimal upscaling from the addition kernel into an implicit cast. This will simplify the addition kernel to stateless addition (IIUC) and give callers control over how to handle upscaling.

Thanks @bkietz , casting args (rescaling decimal inputs) implicitly before exec looks much better than my current implementation. Will try the approach.

@cyb70289
Copy link
Contributor Author

cyb70289 commented Jun 2, 2021

@bkietz , met with one problem, would like to hear your comments. Thanks.

Decimal upscaling is operation dependent. E.g., +,- will upscale arg with smaller scale to align digit, * needn't scaling, / is more complicated.

Implicit args casting happens before kernel is created. DispatchBest only knows arg types, no operation type (kernel dependent) is available. So we cannot figure out the "to be casted" arg type (new precision, scale).
https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/function.cc#L175

Maybe add another callback kernel->explicit_cast() and call it after or inside DispatchBest? Or create different ScalarFunction struct (and DispatchBest) for each decimal operation?

@bkietz
Copy link
Member

bkietz commented Jun 2, 2021

DispatchBest is aware of the operation; it has access to the function's name. You could write a CommonDecimal() function which returns differing scales/precisions for "add"/"subtract", "divide", and "multiply":

if (auto type = CommonNumeric(*values)) {
  ReplaceTypes(type, values);
} else if (auto type = CommonDecimal(name_, *values)) {
  ReplaceTypes(type, values);
}

@cyb70289
Copy link
Contributor Author

cyb70289 commented Jun 4, 2021

Thanks @bkietz , it's almost done except one last catch.

As the output type (precision, scale) is dependent on the inputs, I have a resolver object to calculate output type. The resolver is called with the casted input type, not the original type. It causes problem to division, as the output precision and scale should be calculated from original inputs. No trouble for add/subtract as the output precision/scale is the same for original and casted inputs (digit aligned). Multiply doesn't need cast.

Does it make sense to pass both original input types and the casted types to the resolver [1][2]? We will have to update all existing custom resolver codes.
Maybe add a kernel flag to select passing original or casted input types to the resolver?
Or there are better ways to handle this?

[1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/function.cc#L196
[2] https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/exec.cc#L495

EDIT: Pushed latest code to ease reviewing. Unit test fails due to this issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please open a follow up JIRA to make this public so it can be reused (for example in the comparison functions)

Comment on lines 546 to 549
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be the default case; it's what would be used for any comparison kernel, for example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I think this could be defined for the varargs case (so it could be used in elementwise_min/elementwise_max/...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done yet. Will follow up.

@cyb70289 cyb70289 requested a review from bkietz June 11, 2021 01:46
@cyb70289
Copy link
Contributor Author

Rebased

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, thanks again! Just a few more items

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure we're not adding decimal kernels for functions which don't support decimals yet:

Suggested change
AddDecimalBinaryKernels<Op>(name, &func);
// TODO($FOLLOW_UP_JIRA) remove when decimal is supported for all arithmetic kernels
if (name == "add" || name == "add_checked" || ...) {
AddDecimalBinaryKernels<Op>(name, &func);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please open a follow up JIRA for supporting decimals in the other arithmetic functions as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved AddDecimalBinaryKernels calls out of MakeArithmeticFunction and MakeArithmeticFunctionNotNull, and put it under each kernel supporting decimal operations.
https://github.com/apache/arrow/pull/10364/files#diff-3eafd7246f6a8c699f10d46e3276852fe44b6853b5517ef10396e561730c09f4R840

I'm afraid some arithmetic kernels (e.g., power) may not support decimals?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that will work

I'm afraid some arithmetic kernels (e.g., power) may not support decimals?

This is one topic which the follow up JIRA would give us a better place to discuss

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will merge on green CI.

Thanks for doing this!

@bkietz bkietz closed this in 4743e18 Jun 18, 2021
@cyb70289 cyb70289 deleted the decimal-arith branch June 19, 2021 00:51
sjperkins pushed a commit to sjperkins/arrow that referenced this pull request Jun 23, 2021
Add basic binary arithmetic (+,-,*,/) kernels for decimal types.

Closes apache#10364 from cyb70289/decimal-arith

Authored-by: Yibo Cai <yibo.cai@arm.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants