Skip to content

mp.ma.masked singleton causes difficulties #5806

@ahaldane

Description

@ahaldane

Whenever a MaskedArray method returns a scalar value but that value should be masked, all MaskedArray methods return a reference to a singleton instance np.ma.masked of type MaskedConstant. I think the motivation was to be able to write code like 'if result is masked:'. However, returing a singleton causes some problems that make it difficult to use np.ma.MaskedArray as a 'drop in' replacement for ndarray.

Here is a summary of the problems I see, in the hope that they can be fixed.

1. Many operations involving masked are coerced to float

Probably the worst issue is that masked is of type float. This means the return value of a method may be of a different type than the original array. This is especially bad for boolean arrays. For example, if arr is a boolean array, but all or any return a masked value, the following line will fail since you cannot do ~ to a float.

>>> a = np.ma.array([True, True], mask=[True, True])
>>> ~a.all()

It also means that certain series of operations on masked arrays will sometimes get cast to float when they wouldn't be with ndarrays.

2. Overwriting masked causes strange results in completely separate code

A less serious problem arises if someone tries to assign to the return value of a MaskedArray method, which would end up assigning to the singleton. That will then affect code anywhere that involves the masked singleton. I came across this when one numpy unit test would modify the singleton, and then another would read it, and I would get an error depending on the order the unit tests were run. The problem arises in case like marr2[:] = marr1.method() if the method returns masked. This means marr2 will get filled with arbitrary gargage, but maybe that's not a problem since those values will be masked garbage. (Although, it was a somewhat confusing bug to fix).

3. Code acting on a return value of a MaskedArray method can fail (if masked was returned)

Consider some code of the form

>>> result = arr.sum()
>>> dosomething(result)

This might work fine most of the time, but fail in the (possibly rare) case that the sum returns the singleton. It might be that the operation is not allowed on np.ma.masked, or it might be that further use of np.ma.masked wouldn't work as before.

Most cases of 'dosomething' I checked seem OK, but here are some that cause problems:

a) What if someone decides to remove the mask on a return value? Eg
>>> result = arr.sum()
>>> result.mask = False

if arr.sum happened to return the masked singleton, this would cause havoc. @rgommers suggested making .mask readonly which sounds like a good idea to me, although it means the code will generally run fine for most arr but will raise an error in the possibly rare case the sum is fully masked.

b) writing to a scalar

Consider

>>> result[()] = 6

which would be fine for ndarrays, but raises an error for masked arrays if result is masked (though it's hard to imagine a case where someone would want to index a numpy scalar this way).

c) passing masked as the out parameter of a ufunc
>>> np.ma.log(inputarr, out=result)

I think (though there are other bugs involved here) that using a variable which might be np.ma.masked as the out parameter to a ufunc will cause problems.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions