Add factorial function #1510

renato2099 · 2021-12-30T20:40:38Z

Which issue does this PR close?

Adds the SQL factorial function as defined in https://www.postgresql.org/docs/14/functions-math.html

Rationale for this change

To have additional math functions and have feature parity with postgres

What changes are included in this PR?

Added a new sql function called factorial which computes the factorial of a given number and also added tests to verify how it works.

Are there any user-facing changes?

There are no API changes

jimexist · 2021-12-31T00:35:32Z

thanks @renato2099 do you mind adding negative cases like -1, 10000000000, 1.5, etc.

renato2099 · 2021-12-31T04:59:40Z

hey @jimexist , I added additional tests, let me know if you think I should add some more.
Btw there is one thing I wasn't very happy about and it is the fact that the current implementation takes only f64 but I am not sure if we want to also allow integers as possible input. Anyway, let me know what you think @jimexist ! Thanks!

alamb · 2021-12-31T13:36:04Z

Btw there is one thing I wasn't very happy about and it is the fact that the current implementation takes only f64 but I am not sure if we want to also allow integers as possible input. Anyway, let me know what you think @jimexist ! Thanks!

It might be good to check out sqrt which has a similar pattern (the implementation takes f64 but the coercion logic will convert integer arguments to floats )

alamb · 2021-12-31T13:40:52Z

datafusion/Cargo.toml

 avro-rs = { version = "0.13", features = ["snappy"], optional = true }
 num-traits = { version = "0.2", optional = true }
 pyo3 = { version = "0.14", optional = true }
+statrs = "0.15"


I would prefer to avoid more dependencies for datafusion if possible (e.g. so we don't have to deal with #1498)

It seems to me like a factorial implementation will not change much and the code from statrs is fairly simple;

https://docs.rs/statrs/0.15.0/src/statrs/function/factorial.rs.html#92-108

Perhaps we can implement this code directly in datafusion

agree with this.

thanks for the suggestion, I removed the dependency

alamb

Thanks @renato2099

alamb · 2021-12-31T13:43:48Z

datafusion/src/physical_plan/math_expressions.rs

+pub fn factorial(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+    match &args[0] {
+        ColumnarValue::Array(array) => {
+            let x1 = array.as_any().downcast_ref::<Float64Array>();


I think by this point, the DataFusion coercion logic should have kicked in and you shouldn't have to handle the cases where array is not a Float array -- again I think following sqrt or other similar function might be helpful

alamb · 2021-12-31T13:44:12Z

datafusion/src/physical_plan/math_expressions.rs

+                )),
+            }
+        }
+        _ => Err(DataFusionError::Internal(


ColumnValue::Scalar is legit too I think (aka called with a constant)

alamb · 2021-12-31T13:44:44Z

datafusion/tests/sql/functions.rs

+
+    let mut ctx = ExecutionContext::new();
+    ctx.register_table("test", Arc::new(table))?;
+    let sql = "SELECT factorial(c1) FROM test";


I also recommend adding a constant here

Suggested change

let sql = "SELECT factorial(c1) FROM test";

let sql = "SELECT factorial(5), factorial(c1) FROM test";

As well as tests for what happens if you pass in a integer column

liukun4515 · 2021-12-31T13:58:59Z

datafusion/src/physical_plan/functions.rs

        BuiltinScalarFunction::Lpad => utf8_to_str_type(&input_expr_types[0], "lpad"),
        BuiltinScalarFunction::Ltrim => utf8_to_str_type(&input_expr_types[0], "ltrim"),
        BuiltinScalarFunction::MD5 => utf8_to_str_type(&input_expr_types[0], "md5"),
+        BuiltinScalarFunction::Factorial => Ok(DataType::Float64),


Using the floating-point as the result type, we may lose precision.
In the PG, the result type is numeric, https://www.postgresql.org/docs/14/functions-math.html

right but that is at a logical level, i.e., the numeric data type is not necessarily a physical data type. That said I am extremely not familiar with datafusion 😅 so if you could point me to some code, I'd appreciate it
Also I took a quick look at PG and I think it converts int64 to the actual data type (numeric) but maybe I also missed something in PG's code base
https://github.com/postgres/postgres/blob/46ab07ffda9d6c8e63360ded2d4568aa160a7700/src/backend/utils/adt/numeric.c#L3566

liukun4515 · 2022-01-21T07:47:49Z

@renato2099 any update?

alamb · 2022-01-31T20:44:41Z

Marking PRs without activity in the last month as stale. I'll plan to close it in another month or so without activity, though feel free to reopen it when you have time to work on it)

alamb · 2022-02-15T19:09:05Z

Closing stale PRs. Please reopen (or open a new one) if you plan to keep working on this feature.

renato2099 · 2022-04-03T21:46:28Z

hey guys sorry for the long radio of silence but yeah ... life (and work) got in the middle :)
Anyway, I have some questions before I can complete this feature

It might be good to check out sqrt which has a similar pattern (the implementation takes f64 but the coercion logic will convert integer arguments to floats )

@alamb would you mind sharing a pointer in the code where I could look this up? or do you mean using the unary_primitive_array_op and downcast_compute_op macros as other functions do?

alamb · 2022-04-05T10:48:28Z

hey guys sorry for the long radio of silence but yeah ... life (and work) got in the middle :)

No worries! I totally understand

Here is where the arguments are coerced I think: https://github.com/apache/arrow-datafusion/blob/41b4e491663029f653e491b110d0b5e74d08a0b6/datafusion/core/src/physical_plan/functions.rs#L250

And here is where the return type is defined: https://github.com/apache/arrow-datafusion/blob/41b4e491663029f653e491b110d0b5e74d08a0b6/datafusion/core/src/physical_plan/functions.rs#L233

Add factorial function

4374110

github-actions bot added the datafusion label Dec 30, 2021

Add factorial function

0d82eda

renato2099 force-pushed the renato2099/factorial_fn branch from 9aa2abf to 0d82eda Compare December 30, 2021 23:18

Add factorial function

3223d43

alamb reviewed Dec 31, 2021

View reviewed changes

liukun4515 reviewed Dec 31, 2021

View reviewed changes

alamb added the stale-pr label Jan 31, 2022

alamb closed this Feb 15, 2022

alamb mentioned this pull request Mar 11, 2025

Add test coverage for wasm32 + parquet build #15158

Closed

	let sql = "SELECT factorial(c1) FROM test";
	let sql = "SELECT factorial(5), factorial(c1) FROM test";

Add factorial function #1510

Add factorial function #1510

Uh oh!

Conversation

renato2099 commented Dec 30, 2021

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

jimexist commented Dec 31, 2021

Uh oh!

renato2099 commented Dec 31, 2021

Uh oh!

alamb commented Dec 31, 2021

Uh oh!

alamb Dec 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liukun4515 Dec 31, 2021

Choose a reason for hiding this comment

Uh oh!

renato2099 Apr 3, 2022

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Dec 31, 2021

Choose a reason for hiding this comment

Uh oh!

alamb Dec 31, 2021

Choose a reason for hiding this comment

Uh oh!

alamb Dec 31, 2021

Choose a reason for hiding this comment

Uh oh!

liukun4515 Dec 31, 2021

Choose a reason for hiding this comment

Uh oh!

renato2099 Apr 3, 2022

Choose a reason for hiding this comment

Uh oh!

liukun4515 commented Jan 21, 2022

Uh oh!

alamb commented Jan 31, 2022

Uh oh!

alamb commented Feb 15, 2022

Uh oh!

renato2099 commented Apr 3, 2022

Uh oh!

alamb commented Apr 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alamb Dec 31, 2021 •

edited

Loading