Skip to content

Conversation

@tobixdev
Copy link
Contributor

Which issue does this PR close?

We should open a follow-up issue that tracks re-use FixedSizeBinaryArray::new_null after upgrading arrow-rs.

With this PR, we could backport it to #18843 if we wish.

Alternatively, we could also wait for a fix by arrow-rs.

Rationale for this change

As we will have to wait for another arrow-rs update to get the fix for the issue apache/arrow-rs#8901 , we have a temporaray fix that basically calls the same operations from DataFusion's code. Once the fix is in the arrow-rs codebase, we can re-use FixedSizeBinaryArray::new_null.

What changes are included in this PR?

  • A test. (assert_eq!(result.as_fixed_size_binary().values().len(), 10); returned 0 before)
  • Directly allocate the values buffer in DataFusion.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, the buffer will now be allocated, zeroed, and set to the expected length. Same as in the arrow-rs fix.

@tobixdev
Copy link
Contributor Author

Hmm seems this change causes problems with DataType::FixedSizeBinary(0) as it divides by zero in FixedSizeBinaryArray::try_new.

    /// Create a new [`FixedSizeBinaryArray`] from the provided parts, returning an error on failure
    ///
    /// # Errors
    ///
    /// * `size < 0`
    /// * `values.len() / size != nulls.len()`
    pub fn try_new(
        size: i32,
        values: Buffer,
        nulls: Option<NullBuffer>,
    ) -> Result<Self, ArrowError> {
        let data_type = DataType::FixedSizeBinary(size);
        let s = size.to_usize().ok_or_else(|| {
            ArrowError::InvalidArgumentError(format!("Size cannot be negative, got {size}"))
        })?;

        let len = values.len() / s; <--- crash

I don't see any way to create a new FixedSizeBinaryArray with zero length using try_new. Is this on-purpose? The comments says 0 should be fine. Is this a bug in arrow-rs?

We maybe could create a FixedSizeBinaryArray using the From<ArrayData> implementation.

Any preferences on how we should proceed? As I am unsure whether we even want to fix this issue on the DataFusion side (or wait for an arrow-rs update), I'll wait another opinion.

@Jefffrey
Copy link
Contributor

I don't see any way to create a new FixedSizeBinaryArray with zero length using try_new. Is this on-purpose? The comments says 0 should be fine. Is this a bug in arrow-rs?

This does seem like a bug; for reference, pyarrow allows it:

>>> import pyarrow as pa
>>> pa.binary(0)
FixedSizeBinaryType(fixed_size_binary[0])
>>> pa.array([], type=pa.binary(0))
<pyarrow.lib.FixedSizeBinaryArray object at 0x102030b80>
[]
>>> pa.array([], type=pa.binary(0)).type
FixedSizeBinaryType(fixed_size_binary[0])

It would be nice to upstream a proper fix to arrow-rs and then can wait for it here; otherwise if we need a fix for DataFusion now we can do a workaround as you describe:

We maybe could create a FixedSizeBinaryArray using the From<ArrayData> implementation.

@tobixdev
Copy link
Contributor Author

@Jefffrey Thanks for checking against pyarrow!

Here is a PR that fixes the issue in arrow-rs.

apache/arrow-rs#8927

I also found out that we can create zero-sized item arrays using the builder API. I'll give that a try.

…SizeBinaryArray in `ScalarValue::to_array_of_size`
@Jefffrey
Copy link
Contributor

Should be good to merge once the conflict is fixed fyi @tobixdev

@tobixdev
Copy link
Contributor Author

Thanks for letting me know, I didn't see it. I'll fix it today or tomorrow. 👍

@tobixdev
Copy link
Contributor Author

@Jefffrey Should be good to go. 👍

@Jefffrey Jefffrey added this pull request to the merge queue Nov 30, 2025
Merged via the queue into apache:main with commit 4cfe20f Nov 30, 2025
27 checks passed
@Jefffrey
Copy link
Contributor

Thanks @tobixdev

shashidhar-bm pushed a commit to shashidhar-bm/datafusion that referenced this pull request Dec 1, 2025
…ary in ScalarValue::to_array_of_size (apache#18903)

## Which issue does this PR close?

- Closes apache#18870.

We should open a follow-up issue that tracks re-use
`FixedSizeBinaryArray::new_null` after upgrading arrow-rs.

With this PR, we could backport it to apache#18843 if we wish.

Alternatively, we could also wait for a fix by arrow-rs.

## Rationale for this change

As we will have to wait for another arrow-rs update to get the fix for
the issue apache/arrow-rs#8901 , we have a
temporaray fix that basically calls the same operations from
DataFusion's code. Once the fix is in the arrow-rs codebase, we can
re-use `FixedSizeBinaryArray::new_null`.

## What changes are included in this PR?

- A test. (`assert_eq!(result.as_fixed_size_binary().values().len(),
10);` returned 0 before)
- Directly allocate the values buffer in DataFusion.

## Are these changes tested?

Yes

## Are there any user-facing changes?

Yes, the buffer will now be allocated, zeroed, and set to the expected
length. Same as in the arrow-rs fix.
alamb pushed a commit that referenced this pull request Dec 2, 2025
…DF 51Branch (#19017)

## Which issue does this PR close?

Backports #18903 to DF 51
branch

## Rationale for this change

Fix the bug in a future 51.1.0 release
(#18843)

## What changes are included in this PR?

Same as in #18903

## Are these changes tested?

Yes.

## Are there any user-facing changes?

Yes, fix the original bug and create a non-zero sized values buffer in
the case that exhibits the bug.
@tobixdev
Copy link
Contributor Author

tobixdev commented Dec 3, 2025

@Jefffrey Thanks for the review! Should we track an issue for re-using FixedSizeBinaryArray::new_null after an arrow-rs upgrade? Maybe creating an "update arrow-rs" issue with a task to replace the TODO with the old code?

@Jefffrey
Copy link
Contributor

Jefffrey commented Dec 3, 2025

@Jefffrey Thanks for the review! Should we track an issue for re-using FixedSizeBinaryArray::new_null after an arrow-rs upgrade? Maybe creating an "update arrow-rs" issue with a task to replace the TODO with the old code?

Raising an issue to track this refactor sounds reasonable

github-merge-queue bot pushed a commit that referenced this pull request Jan 14, 2026
…rs Upgrade (#19801)

## Which issue does this PR close?

- Closes #19085

## Rationale for this change

Use idiomatic way of creating a fixed size binary null array.

## What changes are included in this PR?

Basically reverting the workaround from
#18903 as the issue has been
fixed in arrow-rs.

## Are these changes tested?

Yes, test introduced in #18903
.

## Are there any user-facing changes?

No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ScalarValue::to_array_of_size Does Not Correctly Allocate a Values Buffer for ScalarValue::FixedSizeBinary

2 participants