Skip to content

Conversation

@scovich
Copy link
Contributor

@scovich scovich commented Aug 14, 2025

Which issue does this PR close?

Rationale for this change

When working with shredded variants, we need the ability to copy nested object fields and array elements of one variant to a destination. This is a cheap byte-wise copy that relies on the fact that the new variant being built uses the same metadata dictionary as the source variant it is derived from.

What changes are included in this PR?

Define a helper macro that encapsulates the logic for variant appends, now that we have three very similar methods (differing only in their handling of list/object values and their return type).

Add new methods: ValueBuilder::append_variant_bytes, which is called by new methods VariantBuilder::append_value_bytes, ListBuilder::append_value_bytes, and ObjectBuilder::[try_]insert_bytes.

Are these changes tested?

New unit tests

Are there any user-facing changes?

The new methods are public.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in this PR looks great. I think it just needs some tests

Very nice

}
/// Macro to generate the match statement for each append_variant, try_append_variant, and
/// append_variant_bytes -- they each have slightly different handling for object and list handling.
macro_rules! variant_append_value {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@scovich
Copy link
Contributor Author

scovich commented Aug 14, 2025

While adding the tests, it turns out that I need the read-only metadata builder. Otherwise, there's no way to add (or copy) individual object fields across -- even if the set of strings is the same, they would get newly allocated field ids in the output variant, potentially in a different order.

@alamb alamb changed the title [Variant] Allow appending raw object/list bytesto variant builders [Variant] Allow appending raw object/list bytes to variant builders Aug 14, 2025
@scovich
Copy link
Contributor Author

scovich commented Aug 25, 2025

@alamb -- merged up and using read-only metadata builders now

cc @codephage2020 and @klion26 in case you want to make a review pass.

@scovich scovich requested a review from alamb August 25, 2025 17:27
Copy link
Contributor

@codephage2020 codephage2020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, LGTM

/// Macro to generate the match statement for each append_variant, try_append_variant, and
/// append_variant_bytes -- they each have slightly different handling for object and list handling.
macro_rules! variant_append_value {
($builder:expr, $value:expr, $object_pat:pat => $object_arm:expr, $list_pat:pat => $list_arm:expr) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

}

#[test]
fn test_append_variant_bytes_round_trip() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a test case for the Variant::List match arm in append_variant_bytes?

}

#[test]
fn test_complex_nested_filtering_injection() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thank you @scovich @klion26 and @codephage2020

/// The caller must ensure that the metadata dictionary is already built and correct for
/// any objects or lists being appended, but the value's new field name is handled normally.
///
/// # Note
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this note still valid? I think there is no inserting of keys that happens via this API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method creates a new variant object field.
The field's value is raw bytes but the field's key is still inserted normally.
So yes, it can still fail if the key is invalid or duplicate.

@alamb
Copy link
Contributor

alamb commented Aug 26, 2025

🚀

@alamb alamb merged commit 7360b3b into apache:main Aug 26, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support appending raw bytes to variant objects and lists

4 participants