fix: take_bytes should not reuse indices null buffer #6616

viirya · 2024-10-23T07:30:14Z

Which issue does this PR close?

Closes #6617.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

viirya · 2024-10-23T07:55:22Z

arrow-select/src/take.rs

+        }
+
+        let values = r.as_string::<i32>().iter().collect::<Vec<_>>();
+        assert_eq!(&values, &[None, Some("foo"), Some("foo"), Some("foo")])


Before this fix, the result will become None, None, None, None.

tustvold

Comet shouldn't be modifying a buffer that is not exclusively owned by it, this is UB and breaks all sorts of invariants

tustvold · 2024-10-23T08:42:00Z

arrow-select/src/take.rs

+        let null_slice = binding.data_ptr();
+        for i in 0..4 {
+            unsafe {
+                bit_util::unset_bit_raw(null_slice.as_ptr(), i);


I believe this is UB, you're mutating a pointer you don't have exclusive access to

This just simulates the case that the null buffer is modified after take is called.

Even don't consider Comet, I think for take semantics (it is not "clone"), it should not just slice input arrays (data or null).

This just simulates the case that the null buffer is modified after take is called.

Which is UB, you are mutating a buffer without exclusive access to it. The optimisation of reusing the buffer is perfectly sound because of the aliasing invariants of Rust, not only do we rely on this extensively within both arrow and DataFusion, but the compiler itself relies on it

viirya · 2024-10-23T16:06:40Z

Comet shouldn't be modifying a buffer that is not exclusively owned by it, this is UB and breaks all sorts of invariants

It is not about should or shouldn't. Comet reuses underlying buffers in the native reader. This is not exclusive to Comet.

tustvold · 2024-10-23T16:11:53Z

It is not about should or shouldn't. Comet reuses underlying buffers in the native reader. This is not exclusive to Comet.

Rust's memory model disallows aliasing buffers in this way - see here. What you describe contravenes this model and is therefore UB. If you cannot guarantee that comet's native reader won't reuse the buffer behind Rust's back, you have to copy the memory into a fresh allocation.

viirya · 2024-10-23T16:14:06Z

Good. Let's not talk about Comet.

Let's simply focus on the kernel behavior. I think for take semantics (it is not "clone"), it should not just slice input arrays (data or null).

tustvold · 2024-10-23T16:14:52Z

I think for take semantics (it is not "clone"), it should not just slice input arrays (data or null).

Why? It is a perfectly sound optimisation, there is no reason to perform a deep copy here? It is UB for anyone to mutate the buffer behind our back, so there is no reason to not just use it?

viirya · 2024-10-23T16:18:17Z

Why? Currently you have a kernel with inconsistent behavior. Isn't it enough? It makes confused for users and users might need to look into the implementation details to know what the inconsistency comes from.

For example, you tries to modify the indices array after calling take. But some cases you can do it because the array is not cloned, but some cases you cannot because the array is cloned.

I don't think this is good design for a kernel behavior.

tustvold · 2024-10-23T16:26:56Z

For example, you tries to modify the indices array after calling take. But some cases you can do it because the array is not cloned, but some cases you cannot because the array is cloned.
I don't think this is good design for a kernel behavior.

The contract of a kernel should be on the semantic value of the output, this leaves kernels free to implement physical layout optimisations such as this. I don't believe we ever document or articulate any contract on how various kernels should behave w.r.t their inputs, this would not only be very restrictive but extremely fragile. Take is far from the only kernel that will simply clone input buffers, especially if one considers types like DictionaryArray where the underlying dictionary may or may not be recomputed depending on heuristics, or any of the ViewArray types.

From the take kernel's perspective it has no way to know that you want to reuse the null buffer that it was given, so the correct thing is for it to not create a fresh allocation unnecessarily. If code then wants to try to reuse the buffer, it can try, falling back to performing the allocation if necessary. This is safe, sound and optimal.

In general the approach of arrow-rs and DataFusion is to only allocate when absolutely necessary

viirya · 2024-10-23T16:32:43Z

I don't agree but I respect.

tustvold · 2024-10-23T16:34:53Z

Perhaps @alamb might provide a second opinion.

FWIW you need to fix the aliasing bug, regardless of thoughts on what the take kernel is doing, what comet is doing is UB

viirya · 2024-10-23T16:36:01Z

The consistency of kernel behavior and design is higher than such minor optimization. For now, users need to dig into the details inside the kernel when calling, if they don't have any knowledge about this crate. I don't think this is a good design from a higher level point of view.

tustvold · 2024-10-23T16:45:48Z

The consistency of kernel behavior and design is higher than such minor optimization

Perhaps you could articulate what the policy is you expect to be? Should all kernels deep clone all inputs? Would this not have severe performance ramifications?

For now, users need to dig into the details inside the kernel when calling

Yes relying on behaviour beyond the documented contract of a kernel is a recipe for needing to dig into the details of said function. https://xkcd.com/1172/ comes to mind here.

I don't think this is a good design from a higher level point of view.

We may have to agree to disagree, but let's see what Andrew has to say

viirya · 2024-10-23T18:31:49Z

Perhaps you could articulate what the policy is you expect to be? Should all kernels deep clone all inputs? Would this not have severe performance ramifications?

For take kernel, the current behavior surprises me. Although we don't explicitly say it won't clone input array. But I image that for an user without any knowledge about its internals, it "takes" some elements from input array, so it means it "copies" values from the input and produce a "new" output.

This is the take kernel document:

Take elements by index from [Array], creating a new [Array] from those indexes.

I'm not sure for others, but for me "take" semantics is to have something separately, not share it. The current behavior is counterintuitive (at least to me).

Dandandan · 2024-10-23T20:33:14Z

FWIW my view: I agree with @tustvold many kernels try to reuse buffers / input arrays if possible to minimize allocations / copies and I would expect arrow-rs to do so if possible (for whatever kernel) whenever dealing with immutable buffers (behind Arc).

alamb · 2024-10-23T21:44:48Z

Consider the semantics of take kernel, its output array should not reuse input array. The current behavior looks incorrect.

I don't fully understand this assertion. Is the idea that since the docs say

Take elements by index from Array, creating a new Array from those indexes.

That implies the output array should be entirely new (aka not share anything with the source arrays)?

alamb · 2024-10-23T21:55:45Z

Instead of modifying the take kernel, I wonder if we could add some explicit API for comet to call that would force a null buffer copy (or maybe we help write a workaround)

Like could we have comet call some function like

/// Ensure that all underlying buffers are purely owned (not shared)
fn ensure_owned(array: ArrayRef) -> ArrayRef {
  // downcast DictionaryArray and look for slice / shared references
  ...
}

🤔

alamb · 2024-10-23T21:57:28Z

The consistency of kernel behavior and design is higher than such minor optimization. For now, users need to dig into the details inside the kernel when calling, if they don't have any knowledge about this crate. I don't think this is a good design from a higher level point of view.

I agree that having to dig into the internals of the implementation is not a good design. Maybe we should better document that the arrow-rs kernels / implementations all assume immutable buffers and have various optimizations that may reuse portions of the input

alamb · 2024-10-23T21:59:06Z

My biggest concern is not really about this particular PR, but more the general question of "what can be assumed about the output of a function given it got an ArrayRef in ?

Do we need to document / test the various kernels and operations that guarantee no part of the input array would be reused 🤔

tustvold · 2024-10-23T22:04:51Z

Do we need to document / test the various kernels and operations that guarantee no part of the input array would be reused 🤔
Maybe we should better document that the arrow-rs kernels / implementations all assume immutable buffers

We do have the beginnings of the docs here - https://docs.rs/arrow-array/latest/arrow_array/#low-level-api

But we could definitely document better that the array abstractions, and by extension the kernels that operate on them, are built around reference counted immutable buffers, and implementations should not make any assumptions about sole ownership of buffers. If codepaths require sole ownership they must use the safe APIs for doing so, such as into_mutable.

What can be assumed about the output of a function given it got an ArrayRef in

Tbh I am a little confused why one would make any assumptions here, but it certainly couldn't hurt to document that we avoid allocating whenever possible.

viirya · 2024-10-23T22:31:05Z

Take elements by index from Array, creating a new Array from those indexes.

That implies the output array should be entirely new (aka not share anything with the source arrays)?

Once you took something from someone, do you expect that you and someone both have it? 🙂

viirya · 2024-10-23T22:33:09Z

Let me clarify it. I am not against the behavior of sharing buffers between input/output on other kernels. But for take kernel, based on the "take" semantics, it is actually counter-intuitive (at least to me). Especially, it actually "takes" buffers for most cases.

tustvold · 2024-10-23T22:39:35Z

For a DictionaryArray would you expect it to create a deep clone of the dictionary values? What about a ViewArray where the whole design is to avoid copying the values buffers?

I think this discussion is getting a bit philosophical and not particularly productive, we fundamentally cannot change take to always perform a deep copy of the input without significantly regressing performance. It is a fundamental design of arrow-rs, and other arrow implementations, that buffers are immutable reference counted memory regions and kernels are free to take advantage of that.

Perhaps you might be willing to suggest some documentation improvements that would clarify this to avoid future confusion?

viirya · 2024-10-23T23:14:51Z

I will add some documentation to take kernel to clarify its behavior.

I am adding workaround in Comet codebase and will also have some custom unpack dictionary and take to prevent the issue. So it won't rely on arrow-rs kernel.

viirya · 2024-10-23T23:18:58Z

For a DictionaryArray would you expect it to create a deep clone of the dictionary values? What about a ViewArray where the whole design is to avoid copying the values buffers?

What are you trying to say? When unpacking dictionary, how do you not copy dictionary values?

If you are talking about take. When it works on dictionary array, it takes dictionary keys. But it doesn't share dictionary keys with original array. I don't think it has anything wrong.

viirya · 2024-10-24T00:58:30Z

Let me close this. I will open another PR for documenting the kernel. Thanks for your review.

viirya · 2024-10-24T01:59:35Z

I think this discussion is getting a bit philosophical and not particularly productive, we fundamentally cannot change take to always perform a deep copy of the input without significantly regressing performance.

Btw, note that take already does deep copy for other cases. It only clones the null buffer for a particular case. I don't think that such change brings significant regressing performance. If any, it already does. Not to mention it is null buffer, not the data buffer.

tustvold · 2024-10-24T09:06:35Z

Btw, note that take already does deep copy for other cases.

Nobody is disputing this, however, there are many cases where it reuses buffers and entire arrays. Changing this would significantly regress performance for those cases. Yes copying a null buffer as in this PR isn't a terrible regression, but the broader principle you're advocating for that take should never reuse its inputs would be terrible for performance

alamb · 2024-10-24T10:39:22Z

Thank you @viirya @Dandandan and @tustvold -- I hope we end up in a place that is good for Comet and good for other users of arrow as well

viirya · 2024-10-24T16:32:55Z

Nobody is disputing this, however, there are many cases where it reuses buffers and entire arrays. Changing this would significantly regress performance for those cases. Yes copying a null buffer as in this PR isn't a terrible regression, but the broader principle you're advocating for that take should never reuse its inputs would be terrible for performance

I don't really find "many" cases take reuses buffers and entire arrays. Would you mind point it out? 🙂

note: I don't know how you can take indexed values from buffers while reusing the buffers at the same time, except for special ones like dictionary value (which I already mentioned it is reasonable).

tustvold · 2024-10-24T16:55:56Z

From a quick scan

Not sure if that counts as "many", but it is very important for those cases that we don't clone.

Taking a step back though, it is kind of irrelevant whether the kernel reuses or doesn't reuse buffers, arrow is explicitly designed to encourage such reuse and we shouldn't be seeking to document or make commitments either way for any kernel. Not only does doing so limit our optimisations, it would be extremely fragile and hard to test, validate and maintain.

That being said expanding the docs to further clarify this point is likely a valuable addition.

Edit: as a general rule invariants should either be encoded in the type system or checked at runtime

viirya · 2024-10-24T18:52:18Z

Yea, these are the only cases and I believe that are not "many". Not to mention that they are reasonable cases as these arrays are designed to be shared/reused. I already mentioned this in previous comment.

That's why I don't understand that you claim that performance regression will be a problem for such a change. Except for above cases, the only cases reusing is happening on null buffer in take kernel. I doubt it will bring much extra cost on copying them.

With a negligible cost, making take behaves more close to the common semantics of "take" (philosophical? 🙂 ). I don't get why this patch gets string objection.

Anyway, as the workaround was already added to bypass the issue. And I don't think we can get a consensus. I respect your decision here.

tustvold · 2024-10-24T19:05:52Z

I don't get why this patch gets string objection.

You are regressing performance, albeit marginally, to achieve something that isn't part of any kernel's public contract, nor that would be maintained or tested going forwards. So it is hard for me to see this as anything but a net loss, it doesn't add anything and makes performance worse.

viirya · 2024-10-24T19:13:47Z

to achieve something that isn't part of any kernel's public contract

function name is not public contract? You design a function called "not_copy" but it copies?

tustvold · 2024-10-24T19:15:22Z

You design a function called "not_copy" but it copies
Not to mention that they are reasonable cases as these arrays are designed to be shared/reused.

These statements are contradictory.

I'm not going to engage further on this ticket

fix: take_bytes should not reuse indices null buffer

ac94c2b

github-actions bot added the arrow Changes to the arrow crate label Oct 23, 2024

viirya commented Oct 23, 2024

View reviewed changes

tustvold requested changes Oct 23, 2024

View reviewed changes

viirya mentioned this pull request Oct 23, 2024

fix: TopK operator should return correct results on dictionary column with nulls apache/datafusion-comet#1033

Merged

tustvold mentioned this pull request Oct 23, 2024

CometBuffer can potentially lead to concurrent modification of a held buffer (aka is "Unsound" in Rust terms) apache/datafusion-comet#1035

Closed

viirya closed this Oct 24, 2024

viirya deleted the fix_take_bytes branch October 24, 2024 19:49

fix: take_bytes should not reuse indices null buffer #6616

fix: take_bytes should not reuse indices null buffer #6616

Uh oh!

Conversation

viirya commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

viirya Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

tustvold left a comment

Choose a reason for hiding this comment

Uh oh!

tustvold Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

viirya Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

tustvold Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

viirya commented Oct 23, 2024

Uh oh!

tustvold commented Oct 23, 2024

Uh oh!

viirya commented Oct 23, 2024

Uh oh!

tustvold commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viirya commented Oct 23, 2024

Uh oh!

tustvold commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viirya commented Oct 23, 2024

Uh oh!

tustvold commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viirya commented Oct 23, 2024

Uh oh!

tustvold commented Oct 23, 2024

Uh oh!

viirya commented Oct 23, 2024

Uh oh!

Dandandan commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Oct 23, 2024

Uh oh!

alamb commented Oct 23, 2024

Uh oh!

alamb commented Oct 23, 2024

Uh oh!

alamb commented Oct 23, 2024

Uh oh!

tustvold commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viirya commented Oct 23, 2024

Uh oh!

viirya commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tustvold commented Oct 23, 2024

Uh oh!

viirya commented Oct 23, 2024

Uh oh!

viirya commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viirya commented Oct 24, 2024

Uh oh!

viirya commented Oct 24, 2024

viirya commented Oct 23, 2024 •

edited

Loading

tustvold commented Oct 23, 2024 •

edited

Loading

tustvold commented Oct 23, 2024 •

edited

Loading

tustvold commented Oct 23, 2024 •

edited

Loading

Dandandan commented Oct 23, 2024 •

edited

Loading

tustvold commented Oct 23, 2024 •

edited

Loading

viirya commented Oct 23, 2024 •

edited

Loading

viirya commented Oct 23, 2024 •

edited

Loading

tustvold commented Oct 24, 2024 •

edited

Loading

alamb commented Oct 24, 2024 •

edited

Loading

viirya commented Oct 24, 2024 •

edited

Loading

tustvold commented Oct 24, 2024 •

edited

Loading

viirya commented Oct 24, 2024 •

edited

Loading