-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-9285: [C++] Detect unauthorized memory allocations in function kernels #12089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename pull request title in the following format? or See also: |
|
@joosthooz I have following initial comments:
|
|
@joosthooz Need to change this PR title to: |
cpp/src/arrow/compute/exec.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious if hashing all the values-buffers pointers would suffice instead of storing them in a set. We would still need to traverse nested data structures to capture all the child buffers. This could be solved using a Visitor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hashing would be nicer, but it would also trigger the error if a kernel deletes a buffer it doesn't need anymore. And I don't think we care about that, just newly allocated ones. What do you think?
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not posting a more detailed review (for now!), these are only two high-level comments.
cpp/src/arrow/compute/exec.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how we want to handle this feature at all, but we definitely don't want to run those checks unconditionally, unless they're extremely trivial. Perhaps this can be enabled only on debug builds? (#ifndef _NDEBUG, for example)
cc @lidavidm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think debug-only is fine as this will be most useful for developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checks are now inside #ifndef NDEBUG.
cpp/src/arrow/compute/api_scalar.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should not be added in the public APIs. Instead, these functions should only exist in the corresponding test file, IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed them again, also from the function registry. I'm now using a different way of calling the function through an executor, instead of going through CallFunction
cpp/src/arrow/compute/exec.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: to reduce the performance overhead, std::unordered_set<Buffer*> would be better.
cpp/src/arrow/compute/exec.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExecutionError a Gandiva-specific error. Perhaps use Status::Invalid?
|
I think there's a problem with this approach: Besides, child buffers are never preallocated. So what happens is that you must check that preallocated buffers are not changed, but this doesn't prevent other buffers from being allocated as well. |
# Conflicts: # cpp/src/arrow/compute/kernels/CMakeLists.txt
|
Thank you for your contribution. Unfortunately, this pull request has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this PR will be closed in 14 days. Feel free to re-open this if it has been closed in error. If you do not have repository permissions to reopen the PR, please tag a maintainer. |
Targets https://issues.apache.org/jira/browse/ARROW-9285
If a kernel uses preallocated memory allocation (
MemAllocation::PREALLOCATE), it should not create new buffers. This code aims to check for this.Questions/TODO;
DCHECK_*, it is not possible to test whether a misbehaving kernel is detected. I added an example test with a misbehaving kernel, but for that to work the check should throw an Error that the test can then check for usingASSERT_RAISES_WITH_MESSAGEpreallocate_contiguous_, but there might be additional cases where this check could be useful? For example ifvalidity_preallocated_is true, we could check if there aren't any newly created validity buffers. But the currentAddBuffersToSet()code does not support this.(kernel_->mem_allocation == MemAllocation::PREALLOCATE)but I don't know if that is the right way to go.