-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Is your feature request related to a problem or challenge?
We found that computing with Eq Kernel is much faster than RowConverter for array_has
There are other functions that have potential to further speedup with Eq Kernel.
- array_has_all
- array_has_any
- array_intersect
- array_distinct (Maybe π€ ?)
- array_except (Maybe π€ ?)
Describe the solution you'd like
The overall idea is that we flatten the left hand side of the list and iterate right hand side elements, apply Eq kernel for each element. #12062 We could know whether the element is in left hand side of the list by checking the true_count (and null_count for null handling). The eq kernel is vectorized, that is the key of speedup.
Example
array_has_all
array_has_all([1,2,3], [1,2]) -> true
Iterate [1,2], compare [1,2,3] with 1 and [1,2,3] with 2. We will get [true,false,false] and [false,true,false]. Both boolean array contains true, therefore return true
array_has_any
array_has_any([1,2,3], [1,4]) -> true
Iterate [1,2], compare [1,2,3] with 1 and [1,2,3] with 4. We will get [true,false,false] and [false,false,false]. Since first boolean array contains true, therefore return true
array_has_intersect
array_has_intersect([1,2,3], [1,2]) -> [1,2]
The same idea like above, we know that both element is contained in the list. The difference is that we expect to return Array. We could get the expected array with MutableArrayData
I'm not pretty sure about distinct and except, but worth to figure out as well
For more detail implementation could take array_has as reference
Describe alternatives you've considered
No response
Additional context
No response