-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: support decimal for math functions: power #18032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
supporting decimal32 and 64 would also be nice |
|
@AdamGS yes, it's a good idea, I saw an implementation in |
|
Now it supports all four decimal types. The shared implementation in Unfortunately, it was not possible to make a generic implementation for Also, I discovered that a default-constructed decimal type always has scale 10, which is not what we want for operations on Finally, |
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a few more test cases (preferably in SLT) like so:
- Test some of the overflow/compute errors
- Decimal bases that aren't integer numbers like
2.2
| Decimal256(DECIMAL256_MAX_PRECISION, 0), | ||
| Float64, | ||
| ]), | ||
| Numeric(2), // Catch-all for all decimals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this signature is a bit hard with decimals; it might be better off doing this as user_defined type and use coerce_types until we can get better support for arbitrary decimals in the type signature 🤔
- I recall another PR running into a similar problem: https://github.com/apache/datafusion/pull/17113/files#r2282718716
Especially this last Numeric(2) which I think allows having a decimal type as the exponent which I don't think the implementation supports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this link, it's helpful. Let me experiment with per-udf coercion rules and how to make them work in a generic way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jefffrey thank you for telling me about that coercion feature, it is very helpful. I implemented an even more compact version using helpers. Also, handling of null-to-int coercion became much more simple.
| DataType::Float64 => rescale_decimal( | ||
| calculate_binary_math::< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern of rescaling decimals I assume is because the output type of calculate_binary_math can lose the origin precision & scale? We should encompass this in another utility function since it's repeated multiple times here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. calculate_binary_math don't lose precision and scale, it just doesn't know about it. I tried to integrate rescaling to calculate_binary_math itself, but it became difficult to have rescaling enabled for decimal type and disabled for non-decimal types with the lack of specialisation in Rust.
Probably, it's worth too introduce a thin wrapper around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduced a shared wrapper to utils, so it can be reused when moving other functions. Code looks nicer now
| &self.aliases | ||
| } | ||
|
|
||
| fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to be concerned about here is does it allow having a base of int64 but exponent of float64 now? I think this was prevented before as it would either be both int64s or both float64s, but now it seems this coercion is done independently. I wonder if we have any test cases to catch this however 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good question. In the original code it was prevented at a signature level, so the evaluation operated on allowed types only. However, when I tried to ban this int-to-float with new coercion rules, the scalar.slt test failed, assuming we can do it (second expression):
SELECT power(i32, exp_i) as power_i32,
power(i64, exp_f) as power_i64,
pow(f32, exp_i) as power_f32,
power(f64, exp_f) as power_f64,
pow(2, 3) as power_int_scalar,
power(2.5, 3.0) as power_float_scalar
FROM (select test.*, 3 as exp_i, 3.0 as exp_f from test) aI am unsure if it worked before then.
With decimals, we allow float power only if it is convertible to int in runtime (a focus of all these pow_decimal_float checks). Maybe we can apply the same logic to the int-to-float.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mystery solved, it was always executing as float-to-float operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully #18519 can provide a nicer API for specifying the allowed arguments; we can look into refactoring in future 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the improvement! I'll try to reformulate possible combinations in a follow-up PR.
Also, I've added a null handling after merging your changes from #18525 , could you please recheck?
| /// ```text | ||
| /// (b*s) ^ e | ||
| /// ``` | ||
| /// However, the result should be scaled back from scale 0 to scale `s`, | ||
| /// which is done by multiplying by `10^s`. | ||
| /// At the end, the formula is: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm finding this explanation a little confusing, specifically around (b*s) ^ e which implies base is multiplied by scale directly instead of via a power of 10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a mistake, updated formula. Please tell me if it can be improved
|
Looks like some CI failures to address |
This reverts commit b1e839b.
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do refactors in followups
When calling pow(10, -2.0), the power function would fail with: "Arrow error: Arithmetic overflow: Exponent -2 in integer computation is out of bounds." This was a regression introduced in apache#18032 which added decimal support to power(). The issue was that the custom type coercion kept integer bases as Int64 while float exponents became Float64, but the invoke_with_args() assumed Int64 exponent for Int64 base, which doesn't support negative exponents. The fix modifies coerce_types() to coerce integer bases to Float64 when the exponent is Float64, since floating-point arithmetic is required for negative/fractional exponents. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When calling pow(10, -2.0), the power function would fail with: "Arrow error: Arithmetic overflow: Exponent -2 in integer computation is out of bounds." This was a regression introduced in apache#18032 which added decimal support to power(). The issue was that the custom type coercion kept integer bases as Int64 while float exponents became Float64, but the invoke_with_args() assumed Int64 exponent for Int64 base, which doesn't support negative exponents. The fix modifies coerce_types() to coerce integer bases to Float64 when the exponent is Float64, since floating-point arithmetic is required for negative/fractional exponents. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
#19303) ## Which issue does this PR close? Closes #19314 Fixes a regression introduced in #18032 where `pow(10, -2.0)` fails with: ``` Arrow error: Arithmetic overflow: Exponent -2 in integer computation is out of bounds. ``` ## Rationale for this change After #18032 added decimal support to the `power()` function, the custom type coercion kept integer bases as `Int64` while float exponents became `Float64`. However, `invoke_with_args()` assumed `Int64` exponent for `Int64` base and used integer power computation, which doesn't support negative exponents. ## What changes are included in this PR? Modified `coerce_types()` in `datafusion/functions/src/math/power.rs` to coerce integer bases to `Float64` when the exponent is `Float64`, since floating-point arithmetic is required for negative/fractional exponents. This matches the original behavior before #18032 where mixed types would both be coerced to `Float64`. ## Are these changes tested? Yes, added comprehensive tests: - `test_power_coerce_types` - verifies type coercion behavior - `test_power_int_base_negative_float_exp` - tests the specific regression case - `test_power_edge_cases_f64` - tests edge cases for float power (zero exponent, one base, negative base, NaN) - `test_power_edge_cases_i64` - tests edge cases for integer power - `test_power_i64_negative_exp_fails` - verifies integer power with negative exponent fails - `test_power_zero_base_negative_exp` - tests infinity result ## Are there any user-facing changes? No breaking changes. This restores the expected behavior of `pow()` with integer base and float exponent. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
No