BUG MaskedArray __eq__ wrong for masked scalar, multi-d recarray by mhvk · Pull Request #8590 · numpy/numpy

mhvk · 2017-02-09T03:24:11Z

In the process of trying to fix the "questionable behaviour in MaskedArray.__eq__" (#8589), it became clear that the code was more than a little buggy. E.g., ma == ma[0] failed if ma held a structured dtype; multi-d structured dtypes failed generally; and, more worryingly, a masked scalar comparison could be wrong:

np.ma.MaskedArray(1, mask=True) == 0
# True

It doesn't help to do tests on data filled with 0 if one doesn't consistently check the mask after... (for the rest, see the new test cases)

mhvk · 2017-02-09T18:19:52Z

numpy/ma/core.py

I'm still wondering whether it would be better to use .view(type(self)) as long as that is not mvoid while the shape is that of an array.

Not sure what you mean here.

Oh, you mean in the following line?

Yes, indeed. In the old code, the result is unconditionally turned into type(self), which is a problem if one compares a MaskedArray with an mvoid (where the latter is a single element, which will always have precedence since it is a subclass of MaskedArray.

Possibly, it is better solved in mvoid...

Anyway, as written the code works, but let me know if there are other suggestions.

Yes, probably it can also be solved by defining _compare in mvoid, but I agree this is fine.

(also, not sure if the test above for check.dtype.names should be something like check.dtype == np.void to account for dtype('V4')... but maybe that is overthinking it since I suspect truly void types don't work well with masks anyway)

charris · 2017-02-10T02:40:20Z

numpy/ma/core.py

Could use a bit more explanation of the role of the mask.

Could use a bit more explanation of the role of the mask.

Wasn't there before, but happy to do so (but will wait for possible other comments first).

charris · 2017-02-10T02:41:31Z

@ahaldane @shoyer Comment?

ahaldane · 2017-02-14T19:49:39Z

numpy/ma/core.py

I know this is the old behavior, but just want to note that any makes more sense to me than all.

With all, I think it causes the result of the comparison to depend on the data at a masked value, eg:

>>> m = np.ma.mvoid((1,2), mask=(0,1), dtype='i4,f4') >>> n = np.ma.mvoid((1,0), dtype='i4,f4') >>> o = np.ma.mvoid((1,1), dtype='i4,f4') >>> (n == m), (m == n), (o == m) (True, masked, False)

@ahaldane To be clear, is that the behaviour you want, or the behaviour that all implies?

That's the behavior I get with this PR as it is right now (I fetchd + tested it).

I actually didn't work out why n == m is different from m ==n yet.

Should m == n return masked or True? Either seem somewhat reasonable (but obviously both as they are right now is bad!)

Now that you ask, I have been going back and forth on it...

After some thought, the only consistent result I see is to return masked for all three cases, except that we need some extra code so that m == m, ie

>>> p = np.ma.mvoid((1,7), mask=(0,1), dtype='i4,f4') >>> (m == p), (m == m) True, True

which I think requires replacement of mask_or by mask_xor, and then use any instead of all. But possibly we should have m == m return masked, I'm still a little undecided.

Returning True seems worse to me because intuitively I think n and m don't represent equal values, and secondly because the code would have to be much more complicated: We would have to reimplements numpy's internal structure comparison (somewhere in the C code) inside of MaskedArray to ignore fields with masked values.

@ahaldane: I'm suggesting all semantics on the flattened view of the masked data (when such a thing can exist), not on the mask alone. I'm suggesting this because that's what the semantics of normal structured arrays are.

Flattening the data in your example, we get consistent behaviour

>>> o = np.ma.array([[1]], mask=[[True]]) >>> p = np.ma.array([[0]], mask=[[True]]) >>> (o == p).all(axis=-1) masked_array(data = [--], mask = [ True], fill_value = True)

I'd expect your example to return a 1D array, not a scalar, but I assume that's just a mistake.

I think I'm with @eric-wieser here, but maybe best to postpone further discussion until I have the next version up; @ahaldane's example shows that this one is more broken than I had thought (though an improvement on what was there!). In the meantime, I also found that other parts of np.ma are relying on comparison of two masked items being returned as masked True...

By the way, I just noticed there are unused, half-implemented methods _get_recordmask and _set_recordmask defined for MaskedArray...

Isn't np.ma lovely...

A lot of work an exploration was put into it, which hopefully we can learn from in a future tidying/reimplementation. (incidentally, if more numpy members who know about maskedarrays are joining recently, maybe we can make a reimplementation happen soon)

With respect to this PR: I recognize that, as often happens, this PR might be snowballing now that the unit tests are failing upon further changes. I might be in favor of getting this PR through now as a partial fix (at least m == m[0] will work). What do you think?

ahaldane · 2017-02-14T20:02:56Z

Besides the any vs all examples, the rest looks good.

Also, congrats on becoming a numpy member @mhvk ! I'm very happy to have someone else who often works on MaskedArrays here.

mhvk · 2017-02-15T01:05:01Z

@ahaldane - hmm, that's a nasty problem you point out: ideally, we would ignore any masked values in the comparison, but that is not how it was done (or done in my PR). Needs some thought...

mhvk · 2017-02-15T18:07:32Z

@ahaldane - on the "snowballing" -- I think it is not too bad, but I agree that if I don't get to something better by the end of the week, we probably should take this as at least an initial step.

ahaldane · 2017-02-15T18:18:30Z

@mhvk all right, sounds good

mhvk · 2017-02-16T15:06:15Z

@ahaldane, @eric-wieser - I pushed an updated version of MaskedArray.__eq__ -- I think I got all the corner cases right now and sufficient comments that it should be clear what is the intent for whoever looks at this next... (in the process, I also got rid of the rather hacky mask.view(bool_), which would not work if the memory layout of the mask is non-contiguous.)

eric-wieser · 2017-02-16T15:11:02Z

numpy/ma/tests/test_core.py

I personally think that every single one of these tests should return np.masked

In fact, can any two mvoid objects return np.masked on comparison with this patch?

My argument would be that there are 3 cases here:

No matter what values the masked fields take, two objects are always equal

No matter what values the masked fields take, two objects are always not equal

The equality depends on the values of the masked fields

Since the first two of these cases are represented by True and False, surely we should use masked to represent the third case?

Yes, mvoid comparisons can yield masked, but only if all elements are masked (in at least one of the two):

m = mvoid((1, 2.), mask=(0, 1), dtype='i4,f4') p = mvoid((1, 2.), mask=(1, 0), dtype=m.dtype) m == p # masked

I probably should add that to the test cases...

mhvk · 2017-02-16T15:41:04Z

@eric-wieser - replying in the main thread so things do not get lost: I was actually working by an analogy (I think) you made earlier, that the structured arrays should work the same way as would be the case if one tested equality over an axis. Now for that, masked elements are simply ignored:

np.all(np.ma.array([True, False], mask=(0,0)))
# False
np.all(np.ma.array([True, False], mask=(0,1)))
# True
np.all(np.ma.array([True, False], mask=(1,1)))
# masked

I think this is actually reasonably logical given the interpretation that masking elements means one should ignore them: if both arrays mask a given element, just don't care, if one does not mask it, it cannot possibly be equal to a masked one.

mhvk · 2017-02-16T23:27:53Z

Actually, @eric-wieser, your real example was:

m = np.ma.array((1,2), mask=(0,1))
n = np.ma.array((1,0))
m == n
# masked_array(data = [True --],
#              mask = [False  True],
#        fill_value = True)
(m == n).all()
# True

while with my PR,

m = np.ma.array((1,2), mask=(0,1), dtype='i4,i4')
n = np.ma.array((1,0), dtype=m.dtype)
m == n
#  False

so I'm not consistent.

Darn.

eric-wieser · 2017-02-16T23:38:17Z

@mhvk : Perhaps the test suite should try every permutation of mask / value, and compare structured vs unstructured comparison?

ahaldane · 2017-02-17T18:58:27Z

This is a little of a further-out idea, but I'd like to try it out on you.

Maybe we want masked structure comparison to be fundamentally different from plain structure comparison.

What I mean is, maybe instead of returning a single True/False value per element, comparison should return a new structure similar to the mask of a masked structured array, which preserves all fields but makes them booleans. So

>>> m = np.ma.array([(1,2)], mask=[(0,1)], dtype='i4,f4')
>>> n = np.ma.array([(1,1)],  dtype='i4,f4')
>>> m == n
masked_array(data = [(True, --)],
             mask = [(False,  True)],
       fill_value = ( True,  True),
            dtype = [('f0', '?'), ('f1', '?')])

Incidentally, plain structure comparison is itself somewhat poorly defined and gives you lots of deprecation warnings in various cases. Its conceivable we could also change plain structure comparisons to do a similar thing (just without the masks). However I haven't thought it through, and that might be too big a change.

mhvk · 2017-02-18T21:51:04Z

@eric-wieser, @ahaldane - a new version, which now explicitly tests that the structured comparison gives the same result as doing .all() (or .any() for ne) over an axis. I think this makes by far the most sense given the present setup, but do note that it meant I had to change a test -- which apparently did not assume this (I think almost certainly an oversight, and writing tests to match the code). So, there is now a risk of regressions (though given how buggy this part was, I think a fairly small one).

mhvk · 2017-02-18T21:51:10Z

numpy/ma/tests/test_core.py

This is the test I had to change (as well as its equivalent for ne).

mhvk · 2017-02-18T21:53:11Z

@ahaldane - now that __eq__ is always guaranteed to return an array, I think it would indeed be most logical to let it return a structured bool array. I do fear, though, that that ship has sailed long ago -- it would simply break too much code, which relies on such structured array comparisons to work item by item. That said, possibly one should have a new function that does it the more logical way?

ahaldane · 2017-02-19T17:05:40Z

Ok, I like the behavior now and I am happy that the code seems quite elegant now compared to what was there before!

I'll merge in an in an hour or two, if there are no further comments.

(We can keep the "more logical" possible way for another time, if ever. It's such a niche case I don't think its a priority)

mhvk · 2017-02-19T17:11:09Z

@ahaldane - yes, I liked as well that the code ended up becoming reasonably elegant! Note that I realised I had not updated the docstring for __ne__, so I pushed a last changed version...

eric-wieser · 2017-02-19T11:41:21Z

numpy/ma/core.py

This seems inconsistent with the duck-typing we seem to use elsewhere

eric-wieser · 2017-02-19T11:42:30Z

numpy/ma/core.py

I think this might be clearer as just sbroadcast, odatabroadcast = np.broadcast_arrays(self, odata, subok=True)

Yes, on second thought, I agree clarity beats speed here (I essentially wrote out what np.broadcast_arrays does, but skipping broadcasting odata).

@mhvk: I'm also suggesting you pass odatabroadcast rather than odata into filled, for clarity

In fact, I'm starting to think that this should be the default behaviour of filled, allowing filling with a larger fill-value than the dimensions

actually, that breaks tests, it turns out: broadcast_arrays is not guaranteed to return a new instance, and since I set the mask right after, I effectively change self. Not sure that is desired behaviour, but outside of the scope of this PR...

And looking into this, I realise mask is not properly broadcast either (and never was): this fails

np.ma.array([0, 1], mask=[0, 1]) == np.ma.array([[0, 2]])

(it fails in __repr__ but would otherwise be bad too)

@eric-wieser - as you'll see, I had to leave the broadcasting alone, since otherwise I had to make a copy. I agree with you that, ideally, filled would just do the right thing itself. Even better if the filled function could take a mask rather than just calling the method. But I think that's best left for another PR!

eric-wieser · 2017-02-19T11:43:43Z

numpy/ma/core.py

This is probably better as if np.ndim(check) == 0

Oh, unless this is deliberately trying to return 0d arrays and not scalars

eric-wieser · 2017-02-19T11:44:31Z

numpy/ma/core.py

There's a function somewhere within this file for picking the appropriate subclass of MaskedArray

mhvk · 2017-02-21T22:22:28Z

colons inserted...

eric-wieser · 2017-02-21T22:35:54Z

Great. I'll leave this a day in case @ahaldane has any more comments or the tests fail, then looks good to merge. Thanks for tolerating my hole-picking!

ahaldane · 2017-02-22T04:27:00Z

Nice catches @eric-wieser !

I just read through it again, and nothing sticks out to me. Feel free to merge, if you don't I'll do it tomorrow.

eric-wieser · 2017-02-22T15:11:23Z

numpy/ma/core.py

Are there any tests of this underlying bool data?

Yes, note how the tests all do something like test = (?? != ??) and then test both the values and the mask.

eric-wieser · 2017-02-22T15:13:35Z

I think this needs a release note, since we actually did change behaviour - it was a narrow and buggy enough case that probably no one is affected, but if they are, then it would be nice to have a release note to point at

mhvk · 2017-02-22T15:31:31Z

Yes, fine to add a quick note to the release: @charris - this is a bug fix mainly still, which I think should go in 1.12.1. But do I add a note to 1.12.1-notes.rst or to 1.13.0-notes.rst?

eric-wieser · 2017-02-22T15:47:06Z

numpy/ma/tests/test_core.py

If this is trying to test the underlying values, I think it needs a .data

You're presuming that numpy functions correctly account for subclasses.... ;-)

In [2]: test = np.ma.MaskedArray([True, True], mask=[False, True]) In [3]: assert_equal(test, [True, True]) In [4]: test = np.ma.MaskedArray([True, False], mask=[False, True]) In [5]: assert_equal(test, [True, False])

(they should, of course, but that's for another time....)

Wanna bet they don't?

In [2]: test = np.ma.MaskedArray([True, True], mask=[False, True]) In [3]: assert_equal(test, [True, True]) # correct, as before In [4]: assert_equal(test, [True, False]) # uh oh... In [5]: assert_equal(test, [True, np.array("oh dear", dtype=object)]) # oh geez

Grrrrrr.... OK, will change.

charris · 2017-02-22T17:59:37Z

@mhvk When this showed up I was tempted to make it a backport, but because of the changes in tested behavior decided not to. I'd like to branch 1.13 by the end of next month, so the delay should not be excessive.

charris · 2017-02-22T18:05:02Z

As to where to make release note additions for maintenance releases, it varies ;) Generally it should start in current master, but I did it the other way round in the 1.12.0 release because of the numerous updates. As a more practical matter, the 1.12.1 notes don't yet exist in the 1.12.x branch.

mhvk · 2017-02-27T15:04:31Z

OK, added a changelog entry.

eric-wieser · 2017-02-27T16:09:30Z

doc/release/1.13.0-notes.rst

regular would be better as un-structured here, since regular falsely implies "not masked"

In the process of trying to fix the "questionable behaviour in `MaskedArray.__eq__`" (numpygh-8589), it became clear that the code was buggy. E.g., `ma == ma[0]` failed if `ma` held a structured dtype; multi-d structured dtypes failed generally; and, more worryingly, a masked scalar comparison could be wrong: `np.ma.MaskedArray(1, mask=True) == 0` yields True. This commit solves these problems, adding tests to prevent regression. In the process, it also ensures that the results for structured arrays always equals what one would get by logically combining the results over individual parts of the structure.

mhvk · 2017-02-27T16:25:40Z

OK, done. I cancelled the builds, since this passed already.

eric-wieser · 2017-02-27T17:22:56Z

Not sure I'm comfortable merging this without the builds... Problem with rewriting history is I have no way of knowing if you accidentally messed something up, because I can't just see the one-word diff you intended to make. I'm sure you didn't, but it'd be nice to have a passed test to prove it.

eric-wieser · 2017-02-28T00:00:05Z

Thanks for your patience, @mhvk!

mhvk · 2017-02-28T01:21:23Z

Thanks for all the comments, @eric-wieser -- the result is a much better new method than I had initially, one that does more at the same time as being smaller.

This was referenced Feb 9, 2017

Questionable behaviour in MaskedArray.__eq__ on flexible arrays #8589

Closed

MAINT: Use the same exception for all bad axis requests #8584

Merged

mhvk commented Feb 9, 2017

View reviewed changes

charris reviewed Feb 10, 2017

View reviewed changes

charris added 00 - Bug component: numpy.ma masked arrays labels Feb 10, 2017

ahaldane reviewed Feb 14, 2017

View reviewed changes

mhvk force-pushed the ma/eq_ne_axis_bug branch from 1be0dc1 to 6b84bdb Compare February 16, 2017 15:04

eric-wieser reviewed Feb 16, 2017

View reviewed changes

mhvk force-pushed the ma/eq_ne_axis_bug branch 3 times, most recently from 7c1847a to ed86a0b Compare February 18, 2017 21:49

mhvk commented Feb 18, 2017

View reviewed changes

numpy/ma/tests/test_core.py Outdated

Copy link
Copy Markdown

Contributor Author

mhvk Feb 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the test I had to change (as well as its equivalent for ne).

mhvk force-pushed the ma/eq_ne_axis_bug branch from ed86a0b to 7f191b5 Compare February 18, 2017 22:03

mhvk force-pushed the ma/eq_ne_axis_bug branch from 7f191b5 to 400e0a6 Compare February 19, 2017 17:09

eric-wieser reviewed Feb 19, 2017

View reviewed changes

mhvk force-pushed the ma/eq_ne_axis_bug branch from c540303 to 04f83e5 Compare February 21, 2017 22:21

eric-wieser reviewed Feb 22, 2017

View reviewed changes

mhvk force-pushed the ma/eq_ne_axis_bug branch from 04f83e5 to 7e8bfd6 Compare February 22, 2017 16:40

eric-wieser mentioned this pull request Feb 24, 2017

Error when equating a masked and non-masked structured array #3840

Closed

eric-wieser added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Feb 26, 2017

mhvk force-pushed the ma/eq_ne_axis_bug branch from 7e8bfd6 to 3e6a69f Compare February 27, 2017 15:04

eric-wieser reviewed Feb 27, 2017

View reviewed changes

mhvk force-pushed the ma/eq_ne_axis_bug branch from 3e6a69f to f49708b Compare February 27, 2017 16:23

mhvk added 2 commits February 27, 2017 11:24

BUG: ensure masked array comparison with regular void works.

3435dd9

mhvk force-pushed the ma/eq_ne_axis_bug branch from f49708b to 3435dd9 Compare February 27, 2017 16:24

eric-wieser closed this Feb 27, 2017

eric-wieser reopened this Feb 27, 2017

eric-wieser merged commit 64111c5 into numpy:master Feb 28, 2017

homu mentioned this pull request Feb 28, 2017

MAINT: struct assignment "by field position", multi-field indices return views #6053

Merged

mhvk deleted the ma/eq_ne_axis_bug branch February 28, 2017 01:21

mhvk mentioned this pull request Nov 27, 2017

Bug in Table with unicode sandwich related to order of comparison astropy/astropy#6838

Closed

Uh oh!

Conversation

mhvk commented Feb 9, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charris commented Feb 10, 2017

Uh oh!

ahaldane Feb 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Feb 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahaldane Feb 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Feb 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahaldane commented Feb 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhvk commented Feb 15, 2017

Uh oh!

mhvk commented Feb 15, 2017

Uh oh!

ahaldane commented Feb 15, 2017

Uh oh!

mhvk commented Feb 16, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Feb 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhvk commented Feb 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhvk commented Feb 16, 2017

Uh oh!

eric-wieser commented Feb 16, 2017

Uh oh!

ahaldane commented Feb 17, 2017

Uh oh!

mhvk commented Feb 18, 2017

Uh oh!

ahaldane Feb 14, 2017 •

edited

Loading

eric-wieser Feb 14, 2017 •

edited

Loading

ahaldane Feb 14, 2017 •

edited

Loading

eric-wieser Feb 15, 2017 •

edited

Loading

ahaldane commented Feb 14, 2017 •

edited

Loading

eric-wieser Feb 16, 2017 •

edited

Loading

mhvk commented Feb 16, 2017 •

edited

Loading