-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Allocate a zeroed buffer for FixedSizeBinaryArray::null #8901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocate a zeroed buffer for FixedSizeBinaryArray::null #8901
Conversation
|
|
||
| let null_array = FixedSizeBinaryArray::new_null(4, 3); | ||
| assert_eq!(null_array.len(), 3); | ||
| assert_eq!(null_array.values().len(), 12); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reported 0 before this change.
| Self { | ||
| data_type: DataType::FixedSizeBinary(size), | ||
| value_data: MutableBuffer::new(capacity).into(), | ||
| value_data: MutableBuffer::new_null(capacity_in_bytes * 8).into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the 8 come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MutableBuffer::new_null is in bits.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Self { | ||
| data_type: DataType::FixedSizeBinary(size), | ||
| value_data: MutableBuffer::new(capacity).into(), | ||
| value_data: MutableBuffer::new_null(capacity_in_bytes * 8).into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| value_data: MutableBuffer::new_null(capacity_in_bytes * 8).into(), | |
| // MutableBuffer::new_null is in bits. | |
| value_data: MutableBuffer::new_null(capacity_in_bytes * 8).into(), |
etseidl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so familiar with the arrow side of the house, but this seems like correct behavior. It matches what a primitive array would do, and the value_data bytes are already allocated. The only downside I can think of is the extra memset, but that would be done for primitive arrays as well IIUC.
|
I also think we need to prioritize correctness over performance (obviously 😆 ) -- there isn't any obvious way to make this faster that I could see 🤔 |
|
🚀 |
I think zeroing is not required for null values based on the arrow specification. But I guess not doing it has some foot guns and UB issues that we want to avoid. Curious if there is a take on that from arrow-rs. 🤔 Thanks for merging that! I'll make a follow-up PR that adds the small comment. |
Allocating anything in rust is going to zero out the bytes anyways. Unless there is a compelling performance benchmark that shows avoiding the (potentially) extra zeroing makes a non trivial difference I think we should opt for safe |
) # Which issue does this PR close? - Follow-up on #8901 # Rationale for this change It's non-obvious why the number "8" appears here. # What changes are included in this PR? Name the number such that it's more obvious that this is a conversion from bytes to bits. @alamb I can also include the [suggested comment](#8901 (comment)) if you prefer it. I thought the constant may have a lesser risk of becoming outdated without being noticed when changes to `MutableBuffer::new_null` happen. # Are these changes tested? - No behavior changes # Are there any user-facing changes? - No
…ary in ScalarValue::to_array_of_size (#18903) ## Which issue does this PR close? - Closes #18870. We should open a follow-up issue that tracks re-use `FixedSizeBinaryArray::new_null` after upgrading arrow-rs. With this PR, we could backport it to #18843 if we wish. Alternatively, we could also wait for a fix by arrow-rs. ## Rationale for this change As we will have to wait for another arrow-rs update to get the fix for the issue apache/arrow-rs#8901 , we have a temporaray fix that basically calls the same operations from DataFusion's code. Once the fix is in the arrow-rs codebase, we can re-use `FixedSizeBinaryArray::new_null`. ## What changes are included in this PR? - A test. (`assert_eq!(result.as_fixed_size_binary().values().len(), 10);` returned 0 before) - Directly allocate the values buffer in DataFusion. ## Are these changes tested? Yes ## Are there any user-facing changes? Yes, the buffer will now be allocated, zeroed, and set to the expected length. Same as in the arrow-rs fix.
…ary in ScalarValue::to_array_of_size (apache#18903) ## Which issue does this PR close? - Closes apache#18870. We should open a follow-up issue that tracks re-use `FixedSizeBinaryArray::new_null` after upgrading arrow-rs. With this PR, we could backport it to apache#18843 if we wish. Alternatively, we could also wait for a fix by arrow-rs. ## Rationale for this change As we will have to wait for another arrow-rs update to get the fix for the issue apache/arrow-rs#8901 , we have a temporaray fix that basically calls the same operations from DataFusion's code. Once the fix is in the arrow-rs codebase, we can re-use `FixedSizeBinaryArray::new_null`. ## What changes are included in this PR? - A test. (`assert_eq!(result.as_fixed_size_binary().values().len(), 10);` returned 0 before) - Directly allocate the values buffer in DataFusion. ## Are these changes tested? Yes ## Are there any user-facing changes? Yes, the buffer will now be allocated, zeroed, and set to the expected length. Same as in the arrow-rs fix.
Which issue does this PR close?
FixedSizeBinaryArray::new_nullDoes Not Properly Set the Length of the Values Buffer #8900 .Rationale for this change
This causes the values buffer to have the expected length after creating the null array.
What changes are included in this PR?
Use
MutableBuffer::new_nullinstead ofMutableBuffer::newAre these changes tested?
Yes, additional constructor test
Are there any user-facing changes?
Yes, the buffer will now be correctly initialized when calling
FixedSizeBinaryArray::new_null