Skip to content

BUG: add type cast check for ediff1d#11805

Merged
charris merged 1 commit intonumpy:masterfrom
tylerjereddy:issue_11490
Aug 31, 2018
Merged

BUG: add type cast check for ediff1d#11805
charris merged 1 commit intonumpy:masterfrom
tylerjereddy:issue_11490

Conversation

@tylerjereddy
Copy link
Contributor

@tylerjereddy tylerjereddy commented Aug 23, 2018

Fixes #11490.

This is surprisingly tricky, because ediff1d is subject to certain constraints in the unit tests. And indeed, we'd need to be cautious with any kind of behavior change to fix that issue.

Consider test_ediff1d:

zero_elem = np.array([]) # this defaults to float64
assert_array_equal([0], ediff1d(zero_elem, to_begin=0))
# np.asanyarray([0]) and np.array([0]) are int64

If I apply type checking preservation on zero_elem -> returned value there it will fail because the prepend and expected test objects are both int64. Maybe int->float promotion should be allowed but the reverse (which happens in the linked issue) prohibited?

I thought an alternative solution might be to offload type promotion handling to np.append(), but that will fail unit tests because we do sometimes check for type passthrough on the input ary object:

assert(isinstance(np.ediff1d(np.matrix(1), to_begin=1), np.matrix))
# np.append(np.asanyarray(1), np.matrix(1)).dtype is 'int64' for example,
# although this gives a different result too
# same for: np.append(np.asanyarray(1), np.ediff1d(np.matrix(1))).dtype
# here the result is closer but an array instead of matrix

I wonder why ediff1d even has those to_begin and to_end arguments--is there actually an explicit intention to have this behave differently from i.e., append() as far as type handling goes? It looks like this may be cut out for matrix handling in particular maybe?

The docstring for ediff1d isn't so clear on matters of type handling / casting / promotion as far as I can tell, and the current unit tests don't necessarily steer me toward any single clear solution so maybe I need a bit of guidance here.

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to use can_cast rather that checking for type equality

@tylerjereddy
Copy link
Contributor Author

Thanks, I think can_cast actually allows all unit tests to pass too -- we'll see if the CI agrees.

I also drafted a unit test for some "problem scenarios."

@tylerjereddy tylerjereddy changed the title WIP, BUG: initial work on issue 11490. BUG: add type cast check for ediff1d Aug 23, 2018
@tylerjereddy
Copy link
Contributor Author

If this is ok, I suspect it may benefit from DOC update with a raises section; not sure if worth actually reporting the dtypes involved in the tracebacks as well.

Copy link
Member

@eric-wieser eric-wieser Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit, but I'd combine these lines into

raise TypeError("dtype of to_begin must be compatible "
                "with input ary")

or if column alignment isn't your thing:

raise TypeError(
    "dtype of to_begin must be compatible "
    "with input ary")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way you could write this is

@pytest.mark.parametrize("kwargs", [
    dict(ary=np.array([1, 2, 3], dtype=np.int64), append=np.nan),
    dict(ary=np.array([1, 2, 3], dtype=np.int64), prepend=np.array([5, 7, 2], dtype=np.int64)),
    # etc
])
test_ediff1d_type_cast(self, kwargs):
    # catch exception as before
    ediff1d(**kwargs)

eric-wieser
eric-wieser previously approved these changes Aug 23, 2018
Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me

@eric-wieser
Copy link
Member

Does this break np.ediff1d(np.int16([1, 2, 3]), to_begin=0)?

@tylerjereddy
Copy link
Contributor Author

tylerjereddy commented Aug 23, 2018

@eric-wieser Yes, see below. That's maybe part of what I was blabbing on about at the start. The default typecasts that numpy uses on scalars in to_position or empty arrays in the ary position sometimes make me wonder about behavior a bit.

Should that be special cased for some reason?

In [3]: np.ediff1d(np.int16([1, 2, 3]), to_begin=0)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-cf7c8875e3dc> in <module>()
----> 1 np.ediff1d(np.int16([1, 2, 3]), to_begin=0)

~/.local/lib/python3.6/site-packages/numpy-1.16.0.dev0+923f570-py3.6-linux-x86_64.egg/numpy/lib/arraysetops.py in ediff1d(ary, to_end, to_begin)
     99             msg = ("dtype of to_begin must be compatible "
    100                    "with input ary")
--> 101             raise TypeError(msg)
    102
    103         l_begin = len(to_begin)

TypeError: dtype of to_begin must be compatible with input ary

@eric-wieser eric-wieser dismissed their stale review August 23, 2018 05:31

Unsure what the right approach is given the above

@eric-wieser
Copy link
Member

eric-wieser commented Aug 23, 2018

What happens if you move the can_cast to before the array conversion? Or even just before the ravel seems to work for me (although in my opinion 0d arrays should not undergo special conversion rules)

I think the above should probably end up as a test too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to

to_end = np.asanyarray(to_end)
if not np.can_cast(to_end, dtype_req):
    ...
to_end = to_end.ravel()

should fix the to_begin=0 problem

@tylerjereddy
Copy link
Contributor Author

tylerjereddy commented Aug 23, 2018

Ok, I tried to address the revisions; added new unit test as suggested & one of the parametrized tests from my previous iteration needed adjustment because we can now cast from np.nan to float32 with the intermediate conversion to flattened np.asanyarray(np.nan) now occurring after the type check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this type of use of parametrize super hard to read vs a function that just calls assert_equal four times - but I suppose it produces better test output if one starts failing, so I'll begrudgingly accept it.

I'd be curious to see what others have to say about this style

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: A little inconsistent to spell the first of these np.int16([1, 2, 3]) yet the latter np.array([0, 1, 1], dtype=np.int16) - would be better to pick one style and stick with it. The second is probably more typical

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The test name should really indicate that these are invalid /unsafe / forbidden type casts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Might be nice to have a comment before each of these testcases explaining why they fail

@tylerjereddy
Copy link
Contributor Author

Ok, I tried to address the latest round of revisions.

@charris
Copy link
Member

charris commented Aug 26, 2018

The original version of this function is from 2005. I'm wondering if it does anything that diff and gradient do not do?

@tylerjereddy
Copy link
Contributor Author

@charris Are you suggesting deprecation of ediff1d? I saw a recent bug report so I tried to patch it; I had never even heard of this function before. I'm not sure why this function has it own prepend / append behavior, but I suppose that makes it different in that sense.

@charris
Copy link
Member

charris commented Aug 27, 2018

Yes, I am wondering if we should deprecate the function, it even looks out of place in arraysetops. Maybe inquire on the list? Doesn't mean it should not get fixed, though.

@tylerjereddy
Copy link
Contributor Author

tylerjereddy commented Aug 27, 2018

Ok, I followed up with the mailing list & linked a similar discussion on stack overflow about np.diff vs ediff1d.

@charris charris merged commit f17f229 into numpy:master Aug 31, 2018
@charris
Copy link
Member

charris commented Aug 31, 2018

Thanks Tyler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ediff1d handles data type errors badly

3 participants