-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add factorial function #1510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add factorial function #1510
Conversation
9aa2abf to
0d82eda
Compare
|
thanks @renato2099 do you mind adding negative cases like -1, 10000000000, 1.5, etc. |
|
hey @jimexist , I added additional tests, let me know if you think I should add some more. |
It might be good to check out |
| avro-rs = { version = "0.13", features = ["snappy"], optional = true } | ||
| num-traits = { version = "0.2", optional = true } | ||
| pyo3 = { version = "0.14", optional = true } | ||
| statrs = "0.15" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to avoid more dependencies for datafusion if possible (e.g. so we don't have to deal with #1498)
It seems to me like a factorial implementation will not change much and the code from statrs is fairly simple;
https://docs.rs/statrs/0.15.0/src/statrs/function/factorial.rs.html#92-108
Perhaps we can implement this code directly in datafusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the suggestion, I removed the dependency
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @renato2099
| pub fn factorial(args: &[ColumnarValue]) -> Result<ColumnarValue> { | ||
| match &args[0] { | ||
| ColumnarValue::Array(array) => { | ||
| let x1 = array.as_any().downcast_ref::<Float64Array>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think by this point, the DataFusion coercion logic should have kicked in and you shouldn't have to handle the cases where array is not a Float array -- again I think following sqrt or other similar function might be helpful
| )), | ||
| } | ||
| } | ||
| _ => Err(DataFusionError::Internal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColumnValue::Scalar is legit too I think (aka called with a constant)
|
|
||
| let mut ctx = ExecutionContext::new(); | ||
| ctx.register_table("test", Arc::new(table))?; | ||
| let sql = "SELECT factorial(c1) FROM test"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also recommend adding a constant here
| let sql = "SELECT factorial(c1) FROM test"; | |
| let sql = "SELECT factorial(5), factorial(c1) FROM test"; |
As well as tests for what happens if you pass in a integer column
| BuiltinScalarFunction::Lpad => utf8_to_str_type(&input_expr_types[0], "lpad"), | ||
| BuiltinScalarFunction::Ltrim => utf8_to_str_type(&input_expr_types[0], "ltrim"), | ||
| BuiltinScalarFunction::MD5 => utf8_to_str_type(&input_expr_types[0], "md5"), | ||
| BuiltinScalarFunction::Factorial => Ok(DataType::Float64), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the floating-point as the result type, we may lose precision.
In the PG, the result type is numeric, https://www.postgresql.org/docs/14/functions-math.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right but that is at a logical level, i.e., the numeric data type is not necessarily a physical data type. That said I am extremely not familiar with datafusion 😅 so if you could point me to some code, I'd appreciate it
Also I took a quick look at PG and I think it converts int64 to the actual data type (numeric) but maybe I also missed something in PG's code base
https://github.com/postgres/postgres/blob/46ab07ffda9d6c8e63360ded2d4568aa160a7700/src/backend/utils/adt/numeric.c#L3566
|
@renato2099 any update? |
|
Marking PRs without activity in the last month as stale. I'll plan to close it in another month or so without activity, though feel free to reopen it when you have time to work on it) |
|
Closing stale PRs. Please reopen (or open a new one) if you plan to keep working on this feature. |
|
hey guys sorry for the long radio of silence but yeah ... life (and work) got in the middle :)
@alamb would you mind sharing a pointer in the code where I could look this up? or do you mean using the |
No worries! I totally understand Here is where the arguments are coerced I think: https://github.com/apache/arrow-datafusion/blob/41b4e491663029f653e491b110d0b5e74d08a0b6/datafusion/core/src/physical_plan/functions.rs#L250 And here is where the return type is defined: https://github.com/apache/arrow-datafusion/blob/41b4e491663029f653e491b110d0b5e74d08a0b6/datafusion/core/src/physical_plan/functions.rs#L233 |
Which issue does this PR close?
Adds the SQL factorial function as defined in https://www.postgresql.org/docs/14/functions-math.html
Rationale for this change
To have additional math functions and have feature parity with postgres
What changes are included in this PR?
Added a new sql function called
factorialwhich computes the factorial of a given number and also added tests to verify how it works.Are there any user-facing changes?
There are no API changes