-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[Variant]: Implement DataType::List/LargeList support for cast_to_variant kernel #8201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…arge_list_to_variant
…arge_list_to_variant
…arge_list_to_variant
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me -- thank you @sdf-jkl
I have a comment about a future optimization, but I suspect that applies to all nested types
| } | ||
| DataType::List(_) => { | ||
| let list_array = input.as_list::<i32>(); | ||
| let values_variant_array = cast_to_variant(list_array.values().as_ref())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a future optimization, we could avoid casting the entire values to variant if the list only refers to a subset
For example, if the list array is sliced, then start will not be zero, so we could avoid casting the all the elements of the list
The same comment applies below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do it in a separate issue or I can try to implement it here?
I implemented it here. See below.
…kl/arrow-rs into list/large_list_to_variant
| let values = list_array.values(); | ||
| let offsets = list_array.offsets(); | ||
|
|
||
| let first_offset = offsets.first().expect("There should be an offset"); | ||
| let length = offsets.last().expect("There should be an offset") - first_offset; | ||
| let sliced_values = values.slice(*first_offset as usize, length as usize); | ||
|
|
||
| let values_variant_array = cast_to_variant(sliced_values.as_ref())?; | ||
| let new_offsets = OffsetBuffer::new(ScalarBuffer::from_iter( | ||
| offsets.iter().map(|o| o - first_offset), | ||
| )); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the optimization you mentioned above like this.
I also added a test for sliced list arrays.
|
I think it also makes sense to move the logic out of list match arms into a help function that takes the offset type as a parameter. |
|
Thank you @sdf-jkl 🙏 |
Which issue does this PR close?
DataType::List/LargeListsupport forcast_to_variantkernel #8060.Rationale for this change
Need to implement
List,LargeListtypes support forcast_to_variantkernelWhat changes are included in this PR?
Added support for
List,LargeListincast_to_variantkernelAre these changes tested?
Yes, added unit tests
Are there any user-facing changes?
Yes, added changes to the
cast_to_variantkernel