-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelog
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Using datafusion to run a window function partitioned by a nested data type column results in a nested comparison error during execution:
InvalidArgumentError("Nested comparison: Struct([Field { name: \"f1\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]) IS DISTINCT FROM Struct([Field { name: \"f1\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]) (hint: use make_comparator instead)")
This is a feature request to add nested partitioning support to the partition kernel:
arrow-rs/arrow-ord/src/partition.rs
Line 126 in d4b9482
| pub fn partition(columns: &[ArrayRef]) -> Result<Partitions, ArrowError> { |
Describe the solution you'd like
partition shells out to distinct, which does not support nested comparisons:
Lines 179 to 181 in d4b9482
| /// Nested types, such as lists, are not supported as the null semantics are not well-defined. | |
| /// For comparisons involving nested types see [`crate::ord::make_comparator`] | |
| pub fn distinct(lhs: &dyn Datum, rhs: &dyn Datum) -> Result<BooleanArray, ArrowError> { |
My proposal is to add a check for nested type columns and use
make_comparator to check for value distinctness instead.
Describe alternatives you've considered
- Expanding nested array fields to primitive arrays. This seems costly
- Allowing nested comparisons in
compare_opfor certain op types where null ordering semantics don't matter (which is the case here I think). This is another option, but it seems like the proposed approach is a more general solution which can be swapped out if performance becomes an issue.
Metadata
Metadata
Assignees
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelog