Skip to content

Conversation

@friendlymatthew
Copy link
Contributor

@friendlymatthew friendlymatthew commented Nov 20, 2025

Rationale for this change

This PR introduces UnionFields::try_new and UnionFields::from_fields as replacements for the "unsafe" UnionFields::new constructor, which is now deprecated

Previously, UnionFields could be constructed with invalid invariants including negative type ids, duplicate type ids, or duplicate fields. Since UnionArrays index by type id, negative values cause panics at runtimes and duplicates break comparisons.

UnionFields::try_new validates that type ids are non-negative, unique, under 128, and that fields are unique, returning an error if any constraint is violated.

UnionFields::from_fields is a convenience constructor that auto-assigns sequential type ids starting from 0. (Will panic if it exceeds 128 elements)

Note about reviewing

I broke it up into smaller commits. The last commit updates all call sites across the different kernels

@friendlymatthew
Copy link
Contributor Author

cc @alamb

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @friendlymatthew and I am sorry for the delay in review

My only concern about this PR is the potentially new restriction to disallow repeated fields (I worry this could cause regressions for people downstream).

Otherwise this one is 👨‍🍳 👌

let fields = match union.typeIds() {
None => UnionFields::new(0_i8..fields.len() as i8, fields),
Some(ids) => UnionFields::new(ids.iter().map(|i| i as i8), fields),
None => UnionFields::from_fields(fields),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nice API 👍

///
/// This function returns an error if:
/// - Any type_id appears more than once (duplicate type ids)
/// - Any field appears more than once (duplicate fields)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the arrow spec requires unique field names for union. I couldn't find any reference to such a restriction in https://arrow.apache.org/docs/format/Columnar.html#union-layout

Since I think previous versions of arrow-rs allow duplicate fields I don't think it is ok to start erroring on them in this release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that makes sense to me. I suspect this will cause some subtleties in the way we compare Union types, but I'll make sure to document that in a follow up PR

);
}

#[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice tests

@kylebarron
Copy link
Member

UnionFields::from_fields is a convenience constructor that auto-assigns sequential type ids starting from 0. (Will panic if it exceeds 128 elements)

Perhaps we should error instead of panicking for >128 elements?

@friendlymatthew
Copy link
Contributor Author

UnionFields::from_fields is a convenience constructor that auto-assigns sequential type ids starting from 0. (Will panic if it exceeds 128 elements)

Perhaps we should error instead of panicking for >128 elements?

Hi, that makes sense to me. I do really like from_fields being infallible for the cases when I hard code my UnionFields.

What if we include a fallible version, something like UnionFields::try_from_fields and inform users on which method to use

@kylebarron
Copy link
Member

I think having from_fields and try_from_fields makes sense 👍

@alamb
Copy link
Contributor

alamb commented Dec 5, 2025

So I think this PR is waiting on:

  1. Don't error with duplicated field names
  2. from_fields / try_from_fields

@friendlymatthew
Copy link
Contributor Author

So I think this PR is waiting on:

  1. Don't error with duplicated field names
  2. from_fields / try_from_fields

Hi, I am back from vacation and will resume work on (union data type support, variant work, blog post)

@friendlymatthew friendlymatthew force-pushed the friendlymatthew/unionfields_try_new branch 2 times, most recently from 3499adc to d678756 Compare December 5, 2025 18:02
Comment on lines +501 to +521
pub fn try_from_fields<F>(fields: F) -> Result<Self, ArrowError>
where
F: IntoIterator,
F::Item: Into<FieldRef>,
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +431 to +452
/// # Panics
///
/// Panics if the number of fields exceeds 127 (the maximum value for i8 type IDs).
///
/// If you want to avoid panics, use [`UnionFields::try_from_fields`] instead, which
/// returns a `Result`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's a note about the fallible/infallible ways to construct UnionFields

alamb pushed a commit that referenced this pull request Dec 11, 2025
# Which issue does this PR close?

- Closes #8958

# Rationale for this change

This PR introduces 2 ways to access elements of `UnionFields` by index.
You can now use direct indexing (`fields[idx]`) or the safe accessor
`UnionFields::get(index: usize)` to avoid potential panics

Note, since we validate `UnionFields` contains only 128 elements upon
construction, we can forgo this check upon read

- #8891
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @friendlymatthew -- I went through this PR again and it looks good to me

I left a few small comment suggestions, but I think we could do that as a follow on too

@alamb
Copy link
Contributor

alamb commented Dec 12, 2025

looks like there is a clippy error

@friendlymatthew
Copy link
Contributor Author

looks like there is a clippy error

That is what I get for relying on Github

friendlymatthew and others added 4 commits December 12, 2025 14:06
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@friendlymatthew friendlymatthew force-pushed the friendlymatthew/unionfields_try_new branch from 4b5108a to 245e00a Compare December 12, 2025 19:06
@alamb alamb merged commit 63212d2 into apache:main Dec 15, 2025
29 checks passed
@alamb
Copy link
Contributor

alamb commented Dec 15, 2025

Thanks again @friendlymatthew

lidavidm pushed a commit to apache/arrow-adbc that referenced this pull request Jan 15, 2026
Supersede #3884

Although some warnings are issued when upgrading arrow
(apache/arrow-rs#8891), if we avoid deprecated functions and use newly
introduced functions, we will lose compatibility with previous arrow
versions.
Therefore, I ignored the warnings here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate arrow-avro arrow-avro crate arrow-flight Changes to the arrow-flight crate parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants