Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jun 24, 2025

Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.

Rationale for this change

I noticed this while reviewing @Weijun-H's PR

I didn't know what part of the crate were using the fields directly and thus what part could potentially be creating invalid variants. By making the fields non pub(crate) I think it makes it clear these can not be constructed invalidly

This also gave me a good excuse to write some more examples

What changes are included in this PR?

  1. make fields in VariantDecimal* private
  2. Add doc examples examples

Are these changes tested?

By CI

Are there any user-facing changes?

API change in a non-published crate, so no end user impact yet

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jun 24, 2025
@alamb alamb marked this pull request as ready for review June 24, 2025 18:31
@friendlymatthew
Copy link
Contributor

Hi, I wonder if we should do the same for ShortString:

/// A Variant [`ShortString`]
///
/// This implementation is a zero cost wrapper over `&str` that ensures
/// the length of the underlying string is a valid Variant short string (63 bytes or less)
#[derive(Debug, Clone, Copy, PartialEq)]
pub struct ShortString<'a>(pub(crate) &'a str);

@alamb
Copy link
Contributor Author

alamb commented Jun 24, 2025

Hi, I wonder if we should do the same for ShortString:

Good idea, i will do so

Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @alamb

@alamb
Copy link
Contributor Author

alamb commented Jun 24, 2025

Thanks @Weijun-H and @friendlymatthew -- I think this PR will cause some non trivial logical conflicts so I will postpone merging it until we get the initial JSON conversion in

VariantPrimitiveType::Decimal4 => {
let (integer, scale) = decoder::decode_decimal4(value_data)?;
Variant::Decimal4(VariantDecimal4 { integer, scale })
Variant::Decimal4(VariantDecimal4::try_new(integer, scale)?)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think technically constructing variants from binary values did did not verify the integer/scale were within range, but now they do after this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. Good catch!

@alamb
Copy link
Contributor Author

alamb commented Jun 27, 2025

I think this one is now ready to go -- let me know if you would like time to review @scovich or @friendlymatthew or @superserious-dev

I am done with variant for this morning, and will hopefully return this evening

Copy link
Contributor

@friendlymatthew friendlymatthew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change! There's a question about ergonomics and correctness gaps this change uncovers, but maybe they should be addressed as a follow-up PR?

VariantPrimitiveType::Decimal4 => {
let (integer, scale) = decoder::decode_decimal4(value_data)?;
Variant::Decimal4(VariantDecimal4 { integer, scale })
Variant::Decimal4(VariantDecimal4::try_new(integer, scale)?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. Good catch!

Comment on lines +668 to 672
if let Ok(converted_integer) = decimal8.integer().try_into() {
Some((converted_integer, decimal8.scale()))
} else {
None
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just

let converted_integer = decimal8.integer().try_into().ok()?;
Some((converted_integer, decimal8.scale()))

or even

decimal8.integer().try_into().ok().map(|i| (i, decimal8.scale()))

(more below)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(but see other comment -- scale also needs to be checked)

if let Ok(converted_integer) = decimal16.integer.try_into() {
Some((converted_integer, decimal16.scale))
if let Ok(converted_integer) = decimal16.integer().try_into() {
Some((converted_integer, decimal16.scale()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to validate scale as well, because scale cannot be larger than precision:
decimal4 max precision is 9, while decimal16 max precision is 38.

Comment on lines 712 to 721
pub fn as_decimal_int64(&self) -> Option<(i64, u8)> {
match *self {
Variant::Decimal4(decimal) => Some((decimal.integer.into(), decimal.scale)),
Variant::Decimal8(decimal) => Some((decimal.integer, decimal.scale)),
Variant::Decimal4(decimal) => Some((decimal.integer().into(), decimal.scale())),
Variant::Decimal8(decimal) => Some((decimal.integer(), decimal.scale())),
Variant::Decimal16(decimal) => {
if let Ok(converted_integer) = decimal.integer.try_into() {
Some((converted_integer, decimal.scale))
if let Ok(converted_integer) = decimal.integer().try_into() {
Some((converted_integer, decimal.scale()))
} else {
None
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that VariantDecimalXX types are a thing, shouldn't as_decimalXX methods just return them? Combine that with impl From and TryFrom for the various decimal types and this code may become cleaner/safer?

pub fn as_decimal4(&self) -> Option<VariantDecimal4> {
    match *self {
        Variant::Int8(integer) => i32::from(integer).try_into().ok(),
        Variant::Int16(integer) => i32::from(integer).try_into().ok(),
        Variant::Int32(integer) => integer.try_into().ok(),
        Variant::Int64(integer) => i32::try_from(integer)?.try_into().ok(),
        Variant::Decimal4(decimal) => Some(decimal),
        Variant::Decimal8(decimal) => decimal.try_into().ok(),
        Variant::Decimal16(decimal) => decimal.try_into().ok(),
        _ => None,
    }
}

pub fn as_decimal8(&self) -> Option<VariantDecimal8> {
    match *self {
        Variant::Int8(integer) => i64::from(integer).try_into().ok(),
        Variant::Int16(integer) => i64::from(integer).try_into().ok(),
        Variant::Int32(integer) => i64::from(integer).try_into().ok(),
        Variant::Int64(integer) => integer.try_into().ok(),
        Variant::Decimal4(decimal) => Some(decimal.into()),
        Variant::Decimal8(decimal) => Some(decimal),
        Variant::Decimal16(decimal) => decimal.try_into().ok(),
        _ => None,
    }
}

pub fn as_decimal16(&self) -> Option<VariantDecimal16> {
    match *self {
        Variant::Int8(integer) => i128::from(integer).try_into().ok(),
        Variant::Int16(integer) => i128::from(integer).try_into().ok(),
        Variant::Int32(integer) => i128::from(integer).try_into().ok(),
        Variant::Int64(integer) => i128::from(integer).try_into().ok(),
        Variant::Decimal4(decimal) => Some(decimal.into()),
        Variant::Decimal8(decimal) => Some(decimal.into()),
        Variant::Decimal16(decimal) => Some(decimal),
        _ => None,
    }
}
Details
// Infallible conversion from a narrower decimal type to a wider one
macro_rules! impl_from_decimal_for_decimal {
    ($from_ty:ty, $for_ty:ty) => {
        impl From<$from_ty> for $for_ty {
            fn from(decimal: $from_ty) -> Self {
                Self { 
                    integer: decimal.integer().into(),
                    scale: decimal.scale(),
                }
            }
        }
    };
} 

// Fallible conversion from a wider decimal type to a narrower one
macro_rules! impl_try_from_decimal_for_decimal {
    ($from_ty:ty, $for_ty:ty) => {
        impl TryFrom<$from_ty> for $for_ty {
            type Error = ArrowError;

            fn try_from(decimal: $from_ty) -> Result<Self, ArrowError> {
                if decimal.scale() > Self::MAX_PRECISION {
                    return Err(...);
                }
                Ok(Self {
                    integer: decimal.integer().try_into()?,
                    scale: decimal.scale(),
                })
            }
        }
    };
}

// Fallible conversion from a decimal's underlying integer type
macro_rules! impl_try_from_int_for_decimal {
    ($from_ty:ty, $for_ty:ty) => {
        impl TryFrom<$from_ty> for $for_ty {
            type Error = ArrowError;

            fn try_from(integer: $from_ty) -> Result<Self, ArrowError> {
                Self::try_new(integer, 0)
            }
        }
    };
}
impl_from_decimal_for_decimal!(Decimal4, Decimal8);
impl_from_decimal_for_decimal!(Decimal4, Decimal16);
impl_from_decimal_for_decimal!(Decimal8, Decimal16);

impl_try_from_decimal_for_decimal!(Decimal8, Decimal4);
impl_try_from_decimal_for_decimal!(Decimal16, Decimal4);
impl_try_from_decimal_for_decimal!(Decimal16, Decimal8);

impl_try_from_int_for_decimal!(i32, Decimal4);
impl_try_from_int_for_decimal!(i64, Decimal8);
impl_try_from_int_for_decimal!(i128, Decimal16);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also #7809

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- let's do apache/datafusion#5504 as a follow on PR

@alamb
Copy link
Contributor Author

alamb commented Jun 27, 2025

Thanks @scovich and @superserious-dev and @Weijun-H

@alamb alamb merged commit 8d6cada into apache:main Jun 27, 2025
12 checks passed
@alamb alamb deleted the alamb/encapsulated_decimal branch June 27, 2025 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants