Skip to content

Conversation

@renato2099
Copy link
Contributor

Which issue does this PR close?

Adds the SQL factorial function as defined in https://www.postgresql.org/docs/14/functions-math.html

Rationale for this change

To have additional math functions and have feature parity with postgres

What changes are included in this PR?

Added a new sql function called factorial which computes the factorial of a given number and also added tests to verify how it works.

Are there any user-facing changes?

There are no API changes

@renato2099 renato2099 force-pushed the renato2099/factorial_fn branch from 9aa2abf to 0d82eda Compare December 30, 2021 23:18
@jimexist
Copy link
Member

thanks @renato2099 do you mind adding negative cases like -1, 10000000000, 1.5, etc.

@renato2099
Copy link
Contributor Author

hey @jimexist , I added additional tests, let me know if you think I should add some more.
Btw there is one thing I wasn't very happy about and it is the fact that the current implementation takes only f64 but I am not sure if we want to also allow integers as possible input. Anyway, let me know what you think @jimexist ! Thanks!

@alamb
Copy link
Contributor

alamb commented Dec 31, 2021

Btw there is one thing I wasn't very happy about and it is the fact that the current implementation takes only f64 but I am not sure if we want to also allow integers as possible input. Anyway, let me know what you think @jimexist ! Thanks!

It might be good to check out sqrt which has a similar pattern (the implementation takes f64 but the coercion logic will convert integer arguments to floats )

avro-rs = { version = "0.13", features = ["snappy"], optional = true }
num-traits = { version = "0.2", optional = true }
pyo3 = { version = "0.14", optional = true }
statrs = "0.15"
Copy link
Contributor

@alamb alamb Dec 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to avoid more dependencies for datafusion if possible (e.g. so we don't have to deal with #1498)

It seems to me like a factorial implementation will not change much and the code from statrs is fairly simple;

https://docs.rs/statrs/0.15.0/src/statrs/function/factorial.rs.html#92-108

Perhaps we can implement this code directly in datafusion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the suggestion, I removed the dependency

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @renato2099

pub fn factorial(args: &[ColumnarValue]) -> Result<ColumnarValue> {
match &args[0] {
ColumnarValue::Array(array) => {
let x1 = array.as_any().downcast_ref::<Float64Array>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by this point, the DataFusion coercion logic should have kicked in and you shouldn't have to handle the cases where array is not a Float array -- again I think following sqrt or other similar function might be helpful

)),
}
}
_ => Err(DataFusionError::Internal(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnValue::Scalar is legit too I think (aka called with a constant)


let mut ctx = ExecutionContext::new();
ctx.register_table("test", Arc::new(table))?;
let sql = "SELECT factorial(c1) FROM test";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also recommend adding a constant here

Suggested change
let sql = "SELECT factorial(c1) FROM test";
let sql = "SELECT factorial(5), factorial(c1) FROM test";

As well as tests for what happens if you pass in a integer column

BuiltinScalarFunction::Lpad => utf8_to_str_type(&input_expr_types[0], "lpad"),
BuiltinScalarFunction::Ltrim => utf8_to_str_type(&input_expr_types[0], "ltrim"),
BuiltinScalarFunction::MD5 => utf8_to_str_type(&input_expr_types[0], "md5"),
BuiltinScalarFunction::Factorial => Ok(DataType::Float64),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the floating-point as the result type, we may lose precision.
In the PG, the result type is numeric, https://www.postgresql.org/docs/14/functions-math.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right but that is at a logical level, i.e., the numeric data type is not necessarily a physical data type. That said I am extremely not familiar with datafusion 😅 so if you could point me to some code, I'd appreciate it
Also I took a quick look at PG and I think it converts int64 to the actual data type (numeric) but maybe I also missed something in PG's code base
https://github.com/postgres/postgres/blob/46ab07ffda9d6c8e63360ded2d4568aa160a7700/src/backend/utils/adt/numeric.c#L3566

@liukun4515
Copy link
Contributor

@renato2099 any update?

@alamb
Copy link
Contributor

alamb commented Jan 31, 2022

Marking PRs without activity in the last month as stale. I'll plan to close it in another month or so without activity, though feel free to reopen it when you have time to work on it)

@alamb alamb added the stale-pr label Jan 31, 2022
@alamb
Copy link
Contributor

alamb commented Feb 15, 2022

Closing stale PRs. Please reopen (or open a new one) if you plan to keep working on this feature.

@alamb alamb closed this Feb 15, 2022
@renato2099
Copy link
Contributor Author

hey guys sorry for the long radio of silence but yeah ... life (and work) got in the middle :)
Anyway, I have some questions before I can complete this feature

It might be good to check out sqrt which has a similar pattern (the implementation takes f64 but the coercion logic will convert integer arguments to floats )

@alamb would you mind sharing a pointer in the code where I could look this up? or do you mean using the unary_primitive_array_op and downcast_compute_op macros as other functions do?

@alamb
Copy link
Contributor

alamb commented Apr 5, 2022

hey guys sorry for the long radio of silence but yeah ... life (and work) got in the middle :)

No worries! I totally understand

Here is where the arguments are coerced I think: https://github.com/apache/arrow-datafusion/blob/41b4e491663029f653e491b110d0b5e74d08a0b6/datafusion/core/src/physical_plan/functions.rs#L250

And here is where the return type is defined: https://github.com/apache/arrow-datafusion/blob/41b4e491663029f653e491b110d0b5e74d08a0b6/datafusion/core/src/physical_plan/functions.rs#L233

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants