Skip to content

Conversation

@TAdeJong
Copy link
Contributor

@TAdeJong TAdeJong commented Aug 5, 2018

  • Tests added / passed
  • Passes flake8 dask

Fixes #3848, at least for the presented case, with the fix suggested by @jcrist.
I added a test to array/masked to test for this as well. I checked that this test fails for the old situation and passes for the new one, but I do not have the overview whether it is in the right location or even right format. Feedback is very much appreciated.

assert_eq(x, xx)

assert_eq(y, t)
assert isinstance(y.compute(), np.ma.masked_array)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are type checks not already part of assert_eq? If not, then maybe we should add that generally. It's nice to apply checks to assert_eq because then they get applied everywhere.

@mrocklin
Copy link
Member

mrocklin commented Aug 5, 2018

cc @pp-mo for review

@TAdeJong
Copy link
Contributor Author

TAdeJong commented Aug 5, 2018 via email

@TAdeJong
Copy link
Contributor Author

TAdeJong commented Aug 6, 2018

@mrocklin, cc @jcrist as he wrote all of test_masked.py.

This actually has a function assert_eq_ma() which performs the extra assert.
In principle we might want to assert_eq something like the following, to always check for masks:

    if hasattr(a, 'mask') or hasattr(b, 'mask'):
        assert hasattr(a, 'mask') and hasattr(b, 'mask')

But then we get into the trouble of defining when two arrays are equal: this check passes everywhere, except in test_masked.py, which uses it to check equality between masked and non-masked arrays.

Should we add a kwarg to assert_eq switching between these behaviours, or rewrite the existing tests to more explicitly test for either equality of arrays or equivalency of values? (I don't see how everywhere.)

I'd personally propose to keep this test specific to masked_test.py, although I think there are quite a few cases in there where we should replace assert_eq with assert_eq_ma.
If you agree I will update the merge accordingly.

@jcrist
Copy link
Member

jcrist commented Aug 6, 2018

One other issue with asserting types blindly in assert_eq is that sometimes numpy and dask differ on returning a numpy scalar or 0d array (or at least we used to, can't remember if that's still an issue).

I don't remember why I made a ma specific assert_eq function. Perhaps there was a reason, perhaps it was laziness.

Personally, I'd go with the smallest change to get this in for now, but if you want to try expanding assert_eq to test output type that'd be fine too.

assert_eq(dm, m)


def test_copy_deepcopy():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but I also might remove this test from ma, and expand the test_copy_mutate test in dask_array_core.py to test an array of another dtype. That feels more like the correct place, as its testing the copy method, which isn't ma specific.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TAdeJong are you ok to make this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrocklin: holiday without proper internet does not do wonders to my productivity. I will try to have a look this week as I think this is now straightforward, but my schedule is looking somewhat busy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enjoy vacation!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @jcrist 's tone on this it seems like he's ok either way. I think it's also ok to keep this in this testing file mostly as a regression test. It is testing ma-specific functionality just to ensure that they stay masked arrays.

@mrocklin
Copy link
Member

Merging this at the end of the workday if there are no further comments.

@mrocklin mrocklin changed the title Fix 3848 Ensure copy preserves masked arrays Aug 20, 2018
@TAdeJong
Copy link
Contributor Author

Yes, let's just merge it like this if nobody has further comments, although I think there could be some more cleanup and hardening in test_masked.py for future changes, but let's leave that for another week, as that needs some thought and is not needed to fix the original issue.

@mrocklin mrocklin merged commit 19fb2b4 into dask:master Aug 21, 2018
@mrocklin
Copy link
Member

Thanks for the fix @TAdeJong . Merging!

@pp-mo
Copy link

pp-mo commented Aug 28, 2018

Just to confirm -- it does fix our problem.

Thanks for this @TAdeJong @mrocklin
Sorry for slowness -- away on holiday !

@mrocklin
Copy link
Member

Thanks for following up @pp-mo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dask.array deepcopy does not preserve masking since 0.18.2

4 participants