BUG: Fix array_equal for numeric and non-numeric scalar types#27275
BUG: Fix array_equal for numeric and non-numeric scalar types#27275seberg merged 5 commits intonumpy:mainfrom
Conversation
numpy/_core/numeric.py
Outdated
| return False | ||
| # Shapes of a1, a2 and masks are guaranteed to be consistent by this point | ||
| return builtins.bool((a1[~a1nan] == a2[~a1nan]).all()) | ||
| return builtins.bool(asarray(a1[~a1nan] == a2[~a1nan]).all()) |
There was a problem hiding this comment.
I suspect this one might not be needed because we can only end up here with numeric types (the calls isnan(a1) and isnan(a2) must be valid).
There was a problem hiding this comment.
Agreed, all numeric types can be compared. So I think you can skip it if you like.
(That is part of it, the other is that the index result is always 1-D)
seberg
left a comment
There was a problem hiding this comment.
Thanks, I think this is the right thing for backporting.
Fixing the situation long-term, could be done in the == special path (not sure why we return a bool) and also by changing the ufunc to return 0-D arrays if any input is 0-D. A change I had prepared for a while.
Approving, it is fair with or without unsing that second asarray call.
numpy/_core/numeric.py
Outdated
| return False | ||
| # Shapes of a1, a2 and masks are guaranteed to be consistent by this point | ||
| return builtins.bool((a1[~a1nan] == a2[~a1nan]).all()) | ||
| return builtins.bool(asarray(a1[~a1nan] == a2[~a1nan]).all()) |
There was a problem hiding this comment.
Agreed, all numeric types can be compared. So I think you can skip it if you like.
(That is part of it, the other is that the index result is always 1-D)
numpy/_core/numeric.py
Outdated
| if a1 is a2: | ||
| return True | ||
| return builtins.bool((a1 == a2).all()) | ||
| return builtins.bool((asarray(a1 == a2)).all()) |
There was a problem hiding this comment.
Note this changes behaviour for MaskedArray, since it returns a masked array of bool, and here you strip the mask. In principle, not bad conceptually, since the underlying value is False if one array element is masked and the other is not, and True if both are masked. But definitely a change in API that might catch people by surprise: it will no longer be just (a1 == a2).all(), which I think is what most people would expect.
There was a problem hiding this comment.
p.s. I did not check the reasoning for the asarray - asanyarray would solve the problem with MaskedArray - I think that may be good enough to solve the problem here too? Alternatively, replace (...).all() with np.all(...) - that will deal with python bools fine.
There was a problem hiding this comment.
Hmmm, maybe np.all is best, then. The asarray was restoring old behavior, but hand't though about it potentially making a regression for masked (where removing it was maybe an improvement)...
There was a problem hiding this comment.
I have this vague feeling that I indeed removed to ensure MaskedArray worked. But it may have been assert_array_equal... Unfortunately, MaskedArray tests do not really cover the case with regular numpy functions much.
There was a problem hiding this comment.
Using asanyarray preserves the masked array and solves issue #27271, so I updated the PR to use that.
@mhvk Does that address your point? As far as I understand it, the problem with asarray is that it creates an unnecessary copy of the masked array (and hence uses more memory).
I am also looking into updating the scalar comparison to solve #27271 by making np.array(1) == np.array('s') return a numpy bool instead of a python bool.
There was a problem hiding this comment.
The problem is not the use of memory, but rather than np.asarray makes any array subclass into a regular ndarray. So, np.asanyarray solves that!
There was a problem hiding this comment.
On the second line of np.array_equal all input arrays are transformed with asarray, so I am not sure any masked arrays can arrive at the lines we are discussing now. We could change the second line into a1, a2 = asanyarray(a1), asanyarray(a2) so that masked arrays are handled.
To make it a bit more concrete here is a minimal example showing the behaviour of masked arrays.
import numpy as np
x = np.ma.array([1], mask=[1])
y = np.ma.array([1], mask=[1])
w = np.ma.array([0], mask=[1])
z = x == y
bool(z.all()), bool(np.asanyarray(z).all()), bool(np.asarray(z).all()) # False, False, True
np.array_equal(x, y) # what should this be?
np.array_equal(x, w) # what should this be?
On numpy 1.26 and 2.0 the output of np.array_equal(x, y) is True and the output of np.array_equal(x, w) is False. But since the input arrays are fully masked, one could argue that the output of both np.array_equal(x, y) and np.array_equal(x, w) should be the same (but I would not be sure whether the result should then be True or False).
There was a problem hiding this comment.
so I am not sure any masked arrays can arrive at the lines we are discussing now
Yes, you are right. But I guess asanyarray is just as well, anyway, it just shouldn't make a difference.
|
Because this is tagged for backport, please squash when merging. |
|
Thanks @eendebakpt let me put this in for backport, since I think for that it should be good. |
Backport of numpy#27275 Mitigates numpy#27271. The underlying issue (an array comparison returning a python bool instead of a numpy bool) is not addressed. The order of statements is slightly reordered, so that the if `a1 is a2:` check can be done before the calculation of `cannot_have_nan` Closes numpygh-27271
Backport of numpy#27275 Mitigates numpy#27271. The underlying issue (an array comparison returning a python bool instead of a numpy bool) is not addressed. The order of statements is slightly reordered, so that the if `a1 is a2:` check can be done before the calculation of `cannot_have_nan` Closes numpygh-27271
…27275) Mitigates numpy#27271. The underlying issue (an array comparison returning a python bool instead of a numpy bool) is not addressed. The order of statements is slightly reordered, so that the if a1 is a2: check can be done before the calculation of cannot_have_nan
…27275) Mitigates numpy#27271. The underlying issue (an array comparison returning a python bool instead of a numpy bool) is not addressed. The order of statements is slightly reordered, so that the if a1 is a2: check can be done before the calculation of cannot_have_nan
Mitigates #27271. The underlying issue (an array comparison returning a python bool instead of a numpy bool) is not addressed.
The order of statements is slightly reordered, so that the
if a1 is a2:check can be done before the calculation ofcannot_have_nanEDIT,NOTE(seberg): Should be backported to both 2.0.2 and 2.1
Closes gh-27271