Skip to content

Conversation

@davidhassell
Copy link
Contributor

@github-actions github-actions bot added the array label Nov 4, 2022
Copy link
Contributor

@phobson phobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. This looks good to me.

@phobson phobson self-requested a review November 4, 2022 21:16
@phobson
Copy link
Contributor

phobson commented Nov 4, 2022

I'm digging into this a little deeper, btw

@phobson
Copy link
Contributor

phobson commented Nov 5, 2022

So one things that's not clear to me is exactly what should happen. So I took a little survey of the behavior of assigning np.nan and np.ma.masked to int, float, and object arrays in numpy, dask[main] and dask[this PR]:

Floats

Library Array dtype np.nan np.ma.masked
numpy float array of nans array of 0. (float)
Dask Main float array of nans fully masked array, fill value = 1e20
Dask PR #9627 float array of nans fully masked array, fill value = 1e20

Ints

Library Array dtype np.nan np.ma.masked
numpy int cannot convert float NaN to integer array of 0 (int)
Dask Main int cannot convert float NaN to integer fully masked array, fill value = 999999
Dask PR #9627 int cannot convert float NaN to integer fully masked array, fill value = 999999

Objects

Library Array dtype np.nan np.ma.masked
numpy object array of nans array of 0.0 (obj)
Dask Main object array of nans error ("ufunc 'isnan' not supported for the input types...")
Dask PR #9627 object array of nans fully masked array, fill value = '?'

So, to me, it looks like this PR does what it aims to do. My only recommendation is that the test for this functionality go in a new test instead of being a new check in an existing test.

@davidhassell
Copy link
Contributor Author

Hi @phobson, thanks for looking at other cases that I hadn't considered in detail. I'll create a new test ...

x[...] = np.ma.masked
dx[...] = np.ma.masked

assert_eq(x.mask, da.ma.getmaskarray(dx))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that if I try to do assert_eq(x, dx) it fails, but I would have expected not to.

>>> x
masked_array(data=[--, --, --],
             mask=[ True,  True,  True],
       fill_value='?',
            dtype=object)
>>> dx 
dask.array<setitem, shape=(3,), dtype=object, chunksize=(2,), chunktype=numpy.MaskedArray>

>>>dx.compute() 
masked_array(data=[--, --, --],
             mask=[ True,  True,  True],
       fill_value='?',
            dtype=object)           

not sure why the assert_eq(x, dx) does not work.
But with assert_eq(x.mask, da.ma.getmaskarray(dx))
we are asserting the mask is the same, but not the resulting array like in all the other tests.

@jrbourbeau Do you have any thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi - I think it's because np.ma.allclose (https://github.com/dask/dask/blob/2022.10.2/dask/array/utils.py#L177)
treats masked elements as being equal to co-located non-masked elements:

>>> import numpy as np
>>> a = np.ma.arange(3)
>>> b = np.ma.arange(3)
>>> a[0] = np.ma.masked
>>> b[1] = np.ma.masked
>>> print(a)
[-- 1 2]
>>> print(b)
[0 -- 2]
>>> np.ma.allclose(a, b, masked_equal=True)
True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I should also say that the test is checking that both arrays (np and dask) are masked everywhere, so unless we check the mask arrays themselves, we could get a false positive. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we should check that the arrays are the same. Hence me asking why trying assert_eq(x, dx) fails. We should find a way to check for this, ideally without having to compute the dask array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ncclementi, I'm not sure if I need to do anything, here. I think that assert_eq(x, dx) is behaving as expected, in that it mimics how numpy behaves - so it's less that it fails, rather it's not designed to answer the question we're asking. I think that the only way two wholly masked arrays can be safely compared is by comparing their masks, which assert_eq can do easily. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi again, just checking that if there's anything outstanding on this issue? It'd be nice to see it in a January release, if possible :). Thanks.

@mrocklin
Copy link
Member

mrocklin commented Jan 5, 2023

cc @dask/array anyone around who can review this masked array PR?

@davidhassell
Copy link
Contributor Author

Hello,

Just checking in again on this, as it is still needed. I really appreciate all of the Dask maintainers' work, and if you could find the time to have another look that would be great :)

Many thanks,
David

@github-actions
Copy link
Contributor

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

     15 files  ± 0       15 suites  ±0   3h 23m 14s ⏱️ +24s
 13 120 tests + 1   12 185 ✅ + 1     930 💤 ±0  5 ❌ ±0 
162 456 runs  +13  142 364 ✅ +11  20 087 💤 +2  5 ❌ ±0 

For more details on these failures, see this check.

Results for commit 6b56774. ± Comparison against base commit 17f83a4.

Copy link
Collaborator

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, the asset_eq issue can be a follow up

@phofl phofl merged commit cc34543 into dask:main Apr 12, 2024
@phofl
Copy link
Collaborator

phofl commented Apr 12, 2024

thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failure in assignment of np.ma.masked to obect-type Array

5 participants