-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-34154: [Python] Add is_nan method to Array and Expression
#34184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
78ec952 to
36db249
Compare
|
Is this expected? 🔍 |
|
Looking at the tests in C++, I think it is expected: arrow/cpp/src/arrow/compute/kernels/scalar_validity_test.cc Lines 121 to 132 in 53752ad
arrow/cpp/src/arrow/compute/kernels/scalar_validity_test.cc Lines 229 to 244 in 53752ad
|
It's not super clear from the assert message, but so it is the "null" that is different (see my inline comment). As Alenka mentioned above, that is indeed expected (and tested on the C++ side) behaviour. In general, predicate functions (checking something, returning a boolean) propagate nulls (in the idea of "you don't know the value, so you also don't know if it is NaN or not, so whether it should be True or False). |
|
I learned something today. Thanks @AlenkaF for pointing that out. Updated the PR with also adding the option of returning |
|
Great, thanks for adding this! |
a56f3fb to
2229918
Compare
2229918 to
2793dd5
Compare
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good, added some comments on the docstrings.
You also still have a related doctest failure:
__________________ [doctest] pyarrow.lib.ChunkedArray.is_nan ___________________
331 ChunkedArray.is_nan(self)
332
333 Return boolean array indicating the nan values.
334
335 Examples
336 --------
337 >>> import pyarrow as pa
338 >>> import numpy as np
339 >>> arr = pa.chunked_array([[2, np.nan, 4], [4, None, 100]])
340 >>> arr.is_nan()
Differences (unified diff with -expected +actual):
@@ -1,12 +1,10 @@
-<pyarrow.lib.ChunkedArray object at ...>
+<pyarrow.lib.ChunkedArray object at 0x7f91d0ff3360>
[
[
false,
true,
- false
- ],
- [
false,
- None,
+ false,
+ null,
false
]
FYI you can check those locally as well: https://arrow.apache.org/docs/dev/developers/python.html#doctest
python/pyarrow/array.pxi
Outdated
| Returns | ||
| ------- | ||
| array : boolean Array, where null values are None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this "where null values are None" mean exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That a null value, will result in a Python None:
arrow/python/pyarrow/tests/test_compute.py
Lines 1639 to 1643 in df625f2
| def test_is_nan(): | |
| arr = pa.array([1, 2, 3, None, np.nan]) | |
| result = arr.is_nan() | |
| expected = pa.array([False, False, False, None, True]) | |
| assert result.equals(expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK, I didn't get that at first. Now, I don't think that is something we need to mention here in this docstring (that's true for all functions that return arrow data). And it's also only when converting to native Python objects that this happens, the actual result here is a pyarrow.array, and that doesn't use None (so the explanation would be relevant for a method like Array.to_pylist().
The reason you see it in the tests, is because we allow you to use None to specify a null value when constructing an array (but the None gets converted to a proper "null").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche Okay, that's fine with me as well. It was not directly obvious for me, but it makes sense now I'm aware of it.
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
is_nan expressionis_nan method to Array and Expression
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
Benchmark runs are scheduled for baseline = b679a96 and contender = 2ff4e3a. 2ff4e3a is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
is_nanfor Python #34154