-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-1568: [C++] Implement Drop Null Kernel for Arrays #10802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This PR is ready to review :) |
nirandaperera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some comments on the general approach. Great work @aocsa 👏
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Niranda that the record batch implementation probably won't work but I think this is a great start and looks like a very good addition. Thanks for making this!
|
I updated this PR base on last feedback comments. Main change, the way algorithm drop nulls now are based on filter operation and boolean filter array is obtained directly using the bitmap from input array. |
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, added a few comments!
How does "drop_null" work for nested types? I suppose it only takes into account the top-level validity? (if so, might be worth to test (if not yet done) and document explicitily)
|
I updated this PR addressing latest feedback comments. |
nirandaperera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @aocsa 👍 I left some minor comments :-)
5b1eedb to
eb8f878
Compare
|
Thanks @nirandaperera. I updated the PR based on latest feedback. Nice to know more about the Arrow BitUtils :) |
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a few very minor things. Most are fine for a follow up, remove the prints and I'll be good.
1313d61 to
96b3742
Compare
minor fix vc++/win minor fix last fix
minor windows compilation fix reuse Bitmap
b4d9285 to
c98ac6d
Compare
c98ac6d to
27c9cb4
Compare
|
Thanks again @pitrou, just a reminder this PR was updated. Main change related to DropNullTable
|
python/pyarrow/compute.py
Outdated
| return call_function('take', [data, indices], options, memory_pool) | ||
|
|
||
|
|
||
| def drop_null(data, memory_pool=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove this as the proper definition is auto-generated from the C++ metadata.
| } | ||
| auto drop_null_filter = | ||
| std::make_shared<BooleanArray>(batch->num_rows(), dst, nullptr, 0, 0); | ||
| if (drop_null_filter->null_count() == batch->num_rows()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition is never satisfied, will fix.
| auto schm = schema(fields); | ||
| std::vector<std::shared_ptr<ChunkedArray>> empty_table(fields.size()); | ||
| auto col_a = std::make_shared<NullArray>(size); | ||
| auto col_b = std::make_shared<NullArray>(size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the wrong column types. I will fix and add a validation.
|
Note the auto-generated Python docstring isn't entirely accurate (see the I'll open a separate JIRA. |
Ok, thanks @pitrou let me know in which jira ticket will be this to follow up the updates. |
Implement "drop null" kernels that return array without nulls
Note: This can be implemented as a arrow::compute::VectorFunction because the size of the array is changed, so this function is not valid in a SQL-like context