-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Allocate a buffer of the correct length for ScalarValue::FixedSizeBinary in ScalarValue::to_array_of_size #18903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocate a buffer of the correct length for ScalarValue::FixedSizeBinary in ScalarValue::to_array_of_size #18903
Conversation
…ary in ScalarValue::to_array_of_size
|
Hmm seems this change causes problems with /// Create a new [`FixedSizeBinaryArray`] from the provided parts, returning an error on failure
///
/// # Errors
///
/// * `size < 0`
/// * `values.len() / size != nulls.len()`
pub fn try_new(
size: i32,
values: Buffer,
nulls: Option<NullBuffer>,
) -> Result<Self, ArrowError> {
let data_type = DataType::FixedSizeBinary(size);
let s = size.to_usize().ok_or_else(|| {
ArrowError::InvalidArgumentError(format!("Size cannot be negative, got {size}"))
})?;
let len = values.len() / s; <--- crashI don't see any way to create a new We maybe could create a Any preferences on how we should proceed? As I am unsure whether we even want to fix this issue on the DataFusion side (or wait for an arrow-rs update), I'll wait another opinion. |
This does seem like a bug; for reference, pyarrow allows it: >>> import pyarrow as pa
>>> pa.binary(0)
FixedSizeBinaryType(fixed_size_binary[0])
>>> pa.array([], type=pa.binary(0))
<pyarrow.lib.FixedSizeBinaryArray object at 0x102030b80>
[]
>>> pa.array([], type=pa.binary(0)).type
FixedSizeBinaryType(fixed_size_binary[0])It would be nice to upstream a proper fix to arrow-rs and then can wait for it here; otherwise if we need a fix for DataFusion now we can do a workaround as you describe:
|
|
@Jefffrey Thanks for checking against pyarrow! Here is a PR that fixes the issue in arrow-rs. I also found out that we can create zero-sized item arrays using the builder API. I'll give that a try. |
…SizeBinaryArray in `ScalarValue::to_array_of_size`
|
Should be good to merge once the conflict is fixed fyi @tobixdev |
|
Thanks for letting me know, I didn't see it. I'll fix it today or tomorrow. 👍 |
|
@Jefffrey Should be good to go. 👍 |
|
Thanks @tobixdev |
…ary in ScalarValue::to_array_of_size (apache#18903) ## Which issue does this PR close? - Closes apache#18870. We should open a follow-up issue that tracks re-use `FixedSizeBinaryArray::new_null` after upgrading arrow-rs. With this PR, we could backport it to apache#18843 if we wish. Alternatively, we could also wait for a fix by arrow-rs. ## Rationale for this change As we will have to wait for another arrow-rs update to get the fix for the issue apache/arrow-rs#8901 , we have a temporaray fix that basically calls the same operations from DataFusion's code. Once the fix is in the arrow-rs codebase, we can re-use `FixedSizeBinaryArray::new_null`. ## What changes are included in this PR? - A test. (`assert_eq!(result.as_fixed_size_binary().values().len(), 10);` returned 0 before) - Directly allocate the values buffer in DataFusion. ## Are these changes tested? Yes ## Are there any user-facing changes? Yes, the buffer will now be allocated, zeroed, and set to the expected length. Same as in the arrow-rs fix.
…DF 51Branch (#19017) ## Which issue does this PR close? Backports #18903 to DF 51 branch ## Rationale for this change Fix the bug in a future 51.1.0 release (#18843) ## What changes are included in this PR? Same as in #18903 ## Are these changes tested? Yes. ## Are there any user-facing changes? Yes, fix the original bug and create a non-zero sized values buffer in the case that exhibits the bug.
|
@Jefffrey Thanks for the review! Should we track an issue for re-using |
Raising an issue to track this refactor sounds reasonable |
…rs Upgrade (#19801) ## Which issue does this PR close? - Closes #19085 ## Rationale for this change Use idiomatic way of creating a fixed size binary null array. ## What changes are included in this PR? Basically reverting the workaround from #18903 as the issue has been fixed in arrow-rs. ## Are these changes tested? Yes, test introduced in #18903 . ## Are there any user-facing changes? No
Which issue does this PR close?
ScalarValue::to_array_of_sizeDoes Not Correctly Allocate a Values Buffer forScalarValue::FixedSizeBinary#18870.We should open a follow-up issue that tracks re-use
FixedSizeBinaryArray::new_nullafter upgrading arrow-rs.With this PR, we could backport it to #18843 if we wish.
Alternatively, we could also wait for a fix by arrow-rs.
Rationale for this change
As we will have to wait for another arrow-rs update to get the fix for the issue apache/arrow-rs#8901 , we have a temporaray fix that basically calls the same operations from DataFusion's code. Once the fix is in the arrow-rs codebase, we can re-use
FixedSizeBinaryArray::new_null.What changes are included in this PR?
assert_eq!(result.as_fixed_size_binary().values().len(), 10);returned 0 before)Are these changes tested?
Yes
Are there any user-facing changes?
Yes, the buffer will now be allocated, zeroed, and set to the expected length. Same as in the arrow-rs fix.