Skip to content

Conversation

@abalkin
Copy link
Contributor

@abalkin abalkin commented Nov 11, 2013

When a masked array with nomask was passed to ma.compressed()
function, the result was the original masked array instead of
an ndarray.

Closes gh-4026.

numpy/ma/core.py Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. May be weird, but I think the subclass of the data is conserved, so this needs to check explicitely for non-masked arrays. Also, it needs a .ravel(). And I am wondering if compressed should say somewhere in the docs that it is not necessary a copy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it needs a .ravel().

In this case we have one more bug:

>>> ma.compressed(ma.zeros((2,3))).shape
(2, 3)

(Checked with master and 1.7.1.)

Note that np.asanyarray() is just array(a, copy=False, subok=True). My intention was to just change subok to False, but it looks like I inadvertently changed copy to True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be weird, but I think the subclass of the data is conserved, so this needs to check
explicitly for non-masked arrays.

@seberg - I don't understand this comment. Are you saying that if x is a MaskedArray with mask set to nomask and data being a subclass of ndarray this subclass should be preserved? I am not sure how this situation is possible with the current implementation of MaskedArray as a subclass of ndarray. Can you show a test case where you think my code produces a wrong result?

I am starting to think that it is best to rewrite ma.compressed() simply as

def compressed(x):
      return asarray(x).compressed()

and leave optimization of non-masked case for the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what you said is what I meant. I am pretty sure that this is currently possible (even if I find it a bit weird), but I never really used them, so...

There is already an optimization for that in the compressed attribute. I believe the main purpose here is to support non-masked arrays. So anything that is not a masked array must be .ravel()ed I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, missed the first comment. Yes, but it is documented to ravel, and the attribute does so, so that it does not ravel is a bug I think.

@abalkin
Copy link
Contributor Author

abalkin commented Nov 12, 2013

@seberg: I added a commit that reimplements ma.compressed() function simply as ma.asarray(x).compressed(). I believe this is ultimately correct behavior. I also added some tests that should help validate any future optimization. It may be best to just leave this implementation because as this bug and my first attempt to fix it show, it may not be easy to optimize the non-masked case correctly and the effort is better spent optimizing asarray() or the compressed() method.

numpy/ma/core.py Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets at least add an if isinstance(x, MaskedArray): to avoid the np.ma.asarray in that case. The conversion seems quite slow (though that special case could probably be also optimized in ma.asarray).

@abalkin
Copy link
Contributor Author

abalkin commented Nov 12, 2013

@seberg: Special casing isinstance(x, MaskedArray) gives 7x speed up, so it is definitely worth it. I achieve the same speedup by special casing this in ma.asarray(), but I feel that change deserves a separate issue. See gh-4043.

@seberg
Copy link
Member

seberg commented Nov 13, 2013

Hey, I think it should be good. Would be nice if you can squash those follow up commits into the first.

@Nodd
Copy link
Contributor

Nodd commented Nov 13, 2013

Is it necessary to check isinstance(x, MaskedArray) both in ma.compressed() and in ma.asarray() ? Fixing it in ma.asarray() should be sufficient or did I miss something ?

@seberg
Copy link
Member

seberg commented Nov 13, 2013

Yes, should be good enough.

When a masked array with nomask was passed to ma.compressed()
function, the result was the original masked array instead of
an ndarray.

Closes numpygh-4026.
@abalkin
Copy link
Contributor Author

abalkin commented Nov 13, 2013

@seberg - I combined my commits in one and fixed a case when input is an instance of a MaskedArray subtype that overrides .compressed(). New code calls overriding method in this case.

@Nodd - I would like to address ma.asarray() changes separately. See gh-4043. This is a much more used function than ma.compressed() and not making a copy may change behavior in some cases. At the very least gh-4043 is an ENH rather than a BUG, so it is good to keep the two issues separately.

@seberg
Copy link
Member

seberg commented Nov 13, 2013

Can't say I currently care about the fact that the check might as well be only in asarray. If you like you can undo this again in the other thing (and probably other occurances as well). Anyway, I think this is correct, and doubt that it is often called on non-masked arrays (which could be optimized), so merging.

seberg added a commit that referenced this pull request Nov 13, 2013
BUG: ma.compressed() function returns masked array
@seberg seberg merged commit e4d8ad8 into numpy:master Nov 13, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

np.ma.compressed() returns masked array if no elements are masked

3 participants