Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Feb 13, 2024

Draft while I figure out:

  • Get tests passing
  • Review string rewrite / concat in the rewriter and figure out how/why it is necessary (doesn't simplify already do it??)

Which issue does this PR close?

Part of #8045
re #9100

Rationale for this change

Set the pattern for how more array functions can be moved from datafusion to datafusion-functions-array crates

What changes are included in this PR?

  1. Move make_array, array_intersect, array_union and array_distinct function to datafusion-functions-array
  2. Remove vestigal functions around

Are these changes tested?

Yes, by existing tests

Are there any user-facing changes?

No

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules labels Feb 13, 2024
Copy link
Contributor Author

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a start on moving array functions -- I still need to work on this a bit more. FYI @jayzhan211

Ok(())
}

/// Convert one or more [`ArrayRef`] of the same type into a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the code from here to the datafusion-function-array crate

Sqrt = 17;
Tan = 18;
Trunc = 19;
Array = 20;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array is an alias for make_array it seems

);
}

fn align_array_dimensions<O: OffsetSizeTrait>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this over from array_expressions where it is used in one other place. I'll try and figure out how to remove it shortly

@@ -1,254 +0,0 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this code to to_string.rs to align with keeping the ScalarFunction definitions inline with their implementations

use std::any::Any;

// Create static instances of ScalarUDFs for each function
make_udf_function!(ArrayToString,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to to_string.rs

}

// TODO figure out how to generalize this so it isn't hard coded by name
match (fn_name(left), fn_name(right)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we have specific rewrites here for functions (that can't be done by constant folding or some other method). I don't like special casing the names

I need to look into it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Weijun-H do you happen to remember the rationale for this code?

BuiltinScalarFunction::MakeArray,
values,
)))
let make_array = self
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting -- now the SQL planner needs to replace the array literals with a scalar function, but it does the lookup by name. Rather than having this string hard coded maybe there should be a list of such functions somewhere 🤔

make_array,
array,
"returns an Arrow array using the specified input expressions.",
udf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be make_array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggled to come up with a good name. The macro is basically creating two functions

/// fluent expr style
fn make_array(array: Expr) -> Expr {
}

/// return a ScalarUDF
fn udf() -> ScalarUdf { 
...
}

The udf one is used to register the function:

/// Registers all enabled packages with a [`FunctionRegistry`]
pub fn register_all(registry: &mut dyn FunctionRegistry) -> Result<()> {
    let functions: Vec<Arc<ScalarUDF>> = vec![
        to_string::udf(),
        make_array::udf(),
        set_ops::union_udf(),
        set_ops::intersect_udf(),
        set_ops::distinct_udf(),
    ];

So it can't be the expr_fn name , but udf is also pretty generic 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as_udf() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe make_udf 🤔

This comment was marked as outdated.

Copy link
Contributor

@jayzhan211 jayzhan211 Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should name it make_array_udf or something else including a function name since we might not be able to reuse udf or make_udf for other array functions. Prefixing it with array_function seems like a better choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants