Skip to content

Decimal Precision Validation #2387

@tustvold

Description

@tustvold

Which part is this question about

Generally the approach taken by this crate is that a given ArrayData and by extension Array only contains valid data. For example, a StringArray is valid UTF-8 with each index at a codepoint boundary, a dictionary array only has valid indexes, etc... This allows eliding bound checks on access within kernels.

However, in order for this to be sound, it must be impossible to create invalid ArrayData using safe APIs. This means that safe APIs must either:

  • Generate valid data by construction - e.g. the builder APIs
  • Validate data - e.g. ArrayData::try_new

For the examples above incorrect validation can very clearly lead to UB. The situation for decimal values is a bit more confused, in particular I'm not really clear on what the implications of a value that exceeds the precision actually are. However, some notes:

  • As far as I can tell we don't protect against overflow of normal integer types
  • We don't have any decimal arithmetic kernels (yet)
  • The decimal types are fixed bit width and so the precision isn't used to impact their representation

Describe your question

My question boils down to:

  • What is the purpose of the precision argument? Is it just for interoperability with other non-arrow representations?
  • Is there a requirement to saturate/error at the bounds of the precision, or can we simply overflow/saturate at the bounds of the underlying representation
  • Does validating the precision on ingest to ArrayData actually elide any validation when performing computation?

The answers to this will dictate if we can just take a relaxed attitude to precision, and let users opt into validation if they care, and otherwise simply ignore it.

I tried to understand what the C++ implementation is doing, but I honestly got lost. It almost looks like it is performing floating point operations and then rounding them back, which seems surprising...

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions