BUG: add type cast check for ediff1d by tylerjereddy · Pull Request #11805 · numpy/numpy

tylerjereddy · 2018-08-23T00:33:26Z

This is surprisingly tricky, because ediff1d is subject to certain constraints in the unit tests. And indeed, we'd need to be cautious with any kind of behavior change to fix that issue.

Consider test_ediff1d:

zero_elem = np.array([]) # this defaults to float64
assert_array_equal([0], ediff1d(zero_elem, to_begin=0))
# np.asanyarray([0]) and np.array([0]) are int64

If I apply type checking preservation on zero_elem -> returned value there it will fail because the prepend and expected test objects are both int64. Maybe int->float promotion should be allowed but the reverse (which happens in the linked issue) prohibited?

I thought an alternative solution might be to offload type promotion handling to np.append(), but that will fail unit tests because we do sometimes check for type passthrough on the input ary object:

assert(isinstance(np.ediff1d(np.matrix(1), to_begin=1), np.matrix))
# np.append(np.asanyarray(1), np.matrix(1)).dtype is 'int64' for example,
# although this gives a different result too
# same for: np.append(np.asanyarray(1), np.ediff1d(np.matrix(1))).dtype
# here the result is closer but an array instead of matrix

I wonder why ediff1d even has those to_begin and to_end arguments--is there actually an explicit intention to have this behave differently from i.e., append() as far as type handling goes? It looks like this may be cut out for matrix handling in particular maybe?

The docstring for ediff1d isn't so clear on matters of type handling / casting / promotion as far as I can tell, and the current unit tests don't necessarily steer me toward any single clear solution so maybe I need a bit of guidance here.

eric-wieser

Might be better to use can_cast rather that checking for type equality

tylerjereddy · 2018-08-23T02:28:51Z

Thanks, I think can_cast actually allows all unit tests to pass too -- we'll see if the CI agrees.

I also drafted a unit test for some "problem scenarios."

tylerjereddy · 2018-08-23T03:35:29Z

If this is ok, I suspect it may benefit from DOC update with a raises section; not sure if worth actually reporting the dtypes involved in the tracebacks as well.

eric-wieser · 2018-08-23T04:41:24Z

numpy/lib/arraysetops.py

A nit, but I'd combine these lines into

raise TypeError("dtype of to_begin must be compatible " "with input ary")

or if column alignment isn't your thing:

raise TypeError( "dtype of to_begin must be compatible " "with input ary")

eric-wieser · 2018-08-23T04:44:44Z

numpy/lib/tests/test_arraysetops.py

Another way you could write this is

@pytest.mark.parametrize("kwargs", [ dict(ary=np.array([1, 2, 3], dtype=np.int64), append=np.nan), dict(ary=np.array([1, 2, 3], dtype=np.int64), prepend=np.array([5, 7, 2], dtype=np.int64)), # etc ]) test_ediff1d_type_cast(self, kwargs): # catch exception as before ediff1d(**kwargs)

eric-wieser

Generally looks good to me

eric-wieser · 2018-08-23T04:45:20Z

Does this break np.ediff1d(np.int16([1, 2, 3]), to_begin=0)?

tylerjereddy · 2018-08-23T04:51:10Z

@eric-wieser Yes, see below. That's maybe part of what I was blabbing on about at the start. The default typecasts that numpy uses on scalars in to_position or empty arrays in the ary position sometimes make me wonder about behavior a bit.

Should that be special cased for some reason?

In [3]: np.ediff1d(np.int16([1, 2, 3]), to_begin=0)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-cf7c8875e3dc> in <module>()
----> 1 np.ediff1d(np.int16([1, 2, 3]), to_begin=0)

~/.local/lib/python3.6/site-packages/numpy-1.16.0.dev0+923f570-py3.6-linux-x86_64.egg/numpy/lib/arraysetops.py in ediff1d(ary, to_end, to_begin)
     99             msg = ("dtype of to_begin must be compatible "
    100                    "with input ary")
--> 101             raise TypeError(msg)
    102
    103         l_begin = len(to_begin)

TypeError: dtype of to_begin must be compatible with input ary

Unsure what the right approach is given the above

eric-wieser · 2018-08-23T05:37:45Z

What happens if you move the can_cast to before the array conversion? Or even just before the ravel seems to work for me (although in my opinion 0d arrays should not undergo special conversion rules)

I think the above should probably end up as a test too.

eric-wieser · 2018-08-23T06:36:13Z

numpy/lib/arraysetops.py

Changing to

to_end = np.asanyarray(to_end) if not np.can_cast(to_end, dtype_req): ... to_end = to_end.ravel()

should fix the to_begin=0 problem

tylerjereddy · 2018-08-23T19:00:51Z

Ok, I tried to address the revisions; added new unit test as suggested & one of the parametrized tests from my previous iteration needed adjustment because we can now cast from np.nan to float32 with the intermediate conversion to flattened np.asanyarray(np.nan) now occurring after the type check.

eric-wieser · 2018-08-26T02:47:17Z

numpy/lib/tests/test_arraysetops.py

I find this type of use of parametrize super hard to read vs a function that just calls assert_equal four times - but I suppose it produces better test output if one starts failing, so I'll begrudgingly accept it.

I'd be curious to see what others have to say about this style

eric-wieser · 2018-08-26T02:48:10Z

numpy/lib/tests/test_arraysetops.py

Nit: A little inconsistent to spell the first of these np.int16([1, 2, 3]) yet the latter np.array([0, 1, 1], dtype=np.int16) - would be better to pick one style and stick with it. The second is probably more typical

eric-wieser · 2018-08-26T02:48:52Z

numpy/lib/tests/test_arraysetops.py

Nit: The test name should really indicate that these are invalid /unsafe / forbidden type casts

eric-wieser · 2018-08-26T02:49:24Z

numpy/lib/tests/test_arraysetops.py

nit: Might be nice to have a comment before each of these testcases explaining why they fail

tylerjereddy · 2018-08-26T20:00:09Z

Ok, I tried to address the latest round of revisions.

charris · 2018-08-26T20:29:56Z

The original version of this function is from 2005. I'm wondering if it does anything that diff and gradient do not do?

tylerjereddy · 2018-08-27T17:16:38Z

@charris Are you suggesting deprecation of ediff1d? I saw a recent bug report so I tried to patch it; I had never even heard of this function before. I'm not sure why this function has it own prepend / append behavior, but I suppose that makes it different in that sense.

charris · 2018-08-27T17:22:41Z

Yes, I am wondering if we should deprecate the function, it even looks out of place in arraysetops. Maybe inquire on the list? Doesn't mean it should not get fixed, though.

tylerjereddy · 2018-08-27T17:30:10Z

Ok, I followed up with the mailing list & linked a similar discussion on stack overflow about np.diff vs ediff1d.

charris · 2018-08-31T16:19:14Z

Thanks Tyler.

tylerjereddy added the 25 - WIP label Aug 23, 2018

eric-wieser reviewed Aug 23, 2018

View reviewed changes

tylerjereddy force-pushed the issue_11490 branch from b4408a2 to 923f570 Compare August 23, 2018 02:23

tylerjereddy added 00 - Bug and removed 25 - WIP labels Aug 23, 2018

tylerjereddy changed the title ~~WIP, BUG: initial work on issue 11490.~~ BUG: add type cast check for ediff1d Aug 23, 2018

eric-wieser reviewed Aug 23, 2018

View reviewed changes

eric-wieser previously approved these changes Aug 23, 2018

View reviewed changes

eric-wieser reviewed Aug 23, 2018

View reviewed changes

tylerjereddy force-pushed the issue_11490 branch from 923f570 to a79a13d Compare August 23, 2018 18:58

eric-wieser approved these changes Aug 26, 2018

View reviewed changes

eric-wieser reviewed Aug 26, 2018

View reviewed changes

BUG: add type cast check to ediff1d

6171296

tylerjereddy force-pushed the issue_11490 branch from a79a13d to 6171296 Compare August 26, 2018 19:59

charris merged commit f17f229 into numpy:master Aug 31, 2018

mattip mentioned this pull request Jan 10, 2019

BUG: loosen kwargs requirements in ediff1d #12713

Merged

charris mentioned this pull request Jan 20, 2019

BUG: loosen kwargs requirements in ediff1d #12808

Merged

mattip mentioned this pull request Mar 11, 2019

ediff1d with np.nan in to_begin/to_end, behaviour and error messages #13103

Closed

Uh oh!

Conversation

tylerjereddy commented Aug 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eric-wieser left a comment

Choose a reason for hiding this comment

Uh oh!

tylerjereddy commented Aug 23, 2018

Uh oh!

tylerjereddy commented Aug 23, 2018

Uh oh!

eric-wieser Aug 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Aug 23, 2018

Choose a reason for hiding this comment

Uh oh!

eric-wieser left a comment

Choose a reason for hiding this comment

Uh oh!

eric-wieser commented Aug 23, 2018

Uh oh!

tylerjereddy commented Aug 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eric-wieser commented Aug 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eric-wieser Aug 23, 2018

Choose a reason for hiding this comment

Uh oh!

tylerjereddy commented Aug 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eric-wieser Aug 26, 2018

Choose a reason for hiding this comment

Uh oh!

eric-wieser Aug 26, 2018

Choose a reason for hiding this comment

Uh oh!

eric-wieser Aug 26, 2018

Choose a reason for hiding this comment

Uh oh!

eric-wieser Aug 26, 2018

Choose a reason for hiding this comment

Uh oh!

tylerjereddy commented Aug 26, 2018

Uh oh!

charris commented Aug 26, 2018

Uh oh!

tylerjereddy commented Aug 27, 2018

Uh oh!

charris commented Aug 27, 2018

Uh oh!

tylerjereddy commented Aug 27, 2018 • edited by eric-wieser Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charris commented Aug 31, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tylerjereddy commented Aug 23, 2018 •

edited

Loading

eric-wieser Aug 23, 2018 •

edited

Loading

tylerjereddy commented Aug 23, 2018 •

edited

Loading

eric-wieser commented Aug 23, 2018 •

edited

Loading

tylerjereddy commented Aug 23, 2018 •

edited

Loading

tylerjereddy commented Aug 27, 2018 •

edited by eric-wieser

Loading