Skip to content

Conversation

@sdf-jkl
Copy link
Contributor

@sdf-jkl sdf-jkl commented Aug 21, 2025

Which issue does this PR close?

Rationale for this change

Need to implement List, LargeList types support for cast_to_variant kernel

What changes are included in this PR?

Added support for List, LargeList in cast_to_variant kernel

Are these changes tested?

Yes, added unit tests

Are there any user-facing changes?

Yes, added changes to the cast_to_variant kernel

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Aug 21, 2025
@sdf-jkl sdf-jkl changed the title List/large list to variant [Variant]: Implement DataType::List/LargeList support for cast_to_variant kernel Aug 21, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thank you @sdf-jkl

I have a comment about a future optimization, but I suspect that applies to all nested types

}
DataType::List(_) => {
let list_array = input.as_list::<i32>();
let values_variant_array = cast_to_variant(list_array.values().as_ref())?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a future optimization, we could avoid casting the entire values to variant if the list only refers to a subset

For example, if the list array is sliced, then start will not be zero, so we could avoid casting the all the elements of the list

The same comment applies below

Copy link
Contributor Author

@sdf-jkl sdf-jkl Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do it in a separate issue or I can try to implement it here?
I implemented it here. See below.

Comment on lines +540 to +550
let values = list_array.values();
let offsets = list_array.offsets();

let first_offset = offsets.first().expect("There should be an offset");
let length = offsets.last().expect("There should be an offset") - first_offset;
let sliced_values = values.slice(*first_offset as usize, length as usize);

let values_variant_array = cast_to_variant(sliced_values.as_ref())?;
let new_offsets = OffsetBuffer::new(ScalarBuffer::from_iter(
offsets.iter().map(|o| o - first_offset),
));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the optimization you mentioned above like this.
I also added a test for sliced list arrays.

@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Aug 22, 2025

I think it also makes sense to move the logic out of list match arms into a help function that takes the offset type as a parameter.
Something like convert_list_array(i32/i64) would help to reduce code duplication.

@alamb
Copy link
Contributor

alamb commented Aug 22, 2025

Thank you @sdf-jkl 🙏

@alamb alamb merged commit 549709f into apache:main Aug 22, 2025
11 checks passed
@sdf-jkl sdf-jkl deleted the list/large_list_to_variant branch August 22, 2025 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant]: Implement DataType::List/LargeList support for cast_to_variant kernel

2 participants