BUG: add type cast check for ediff1d#11805
Conversation
eric-wieser
left a comment
There was a problem hiding this comment.
Might be better to use can_cast rather that checking for type equality
b4408a2 to
923f570
Compare
|
Thanks, I think I also drafted a unit test for some "problem scenarios." |
|
If this is ok, I suspect it may benefit from DOC update with a raises section; not sure if worth actually reporting the dtypes involved in the tracebacks as well. |
numpy/lib/arraysetops.py
Outdated
There was a problem hiding this comment.
A nit, but I'd combine these lines into
raise TypeError("dtype of to_begin must be compatible "
"with input ary")or if column alignment isn't your thing:
raise TypeError(
"dtype of to_begin must be compatible "
"with input ary")
numpy/lib/tests/test_arraysetops.py
Outdated
There was a problem hiding this comment.
Another way you could write this is
@pytest.mark.parametrize("kwargs", [
dict(ary=np.array([1, 2, 3], dtype=np.int64), append=np.nan),
dict(ary=np.array([1, 2, 3], dtype=np.int64), prepend=np.array([5, 7, 2], dtype=np.int64)),
# etc
])
test_ediff1d_type_cast(self, kwargs):
# catch exception as before
ediff1d(**kwargs)
eric-wieser
left a comment
There was a problem hiding this comment.
Generally looks good to me
|
Does this break |
|
@eric-wieser Yes, see below. That's maybe part of what I was blabbing on about at the start. The default typecasts that numpy uses on scalars in to_position or empty arrays in the Should that be special cased for some reason? |
Unsure what the right approach is given the above
|
What happens if you move the I think the above should probably end up as a test too. |
numpy/lib/arraysetops.py
Outdated
There was a problem hiding this comment.
Changing to
to_end = np.asanyarray(to_end)
if not np.can_cast(to_end, dtype_req):
...
to_end = to_end.ravel()
should fix the to_begin=0 problem
923f570 to
a79a13d
Compare
|
Ok, I tried to address the revisions; added new unit test as suggested & one of the parametrized tests from my previous iteration needed adjustment because we can now cast from |
numpy/lib/tests/test_arraysetops.py
Outdated
There was a problem hiding this comment.
I find this type of use of parametrize super hard to read vs a function that just calls assert_equal four times - but I suppose it produces better test output if one starts failing, so I'll begrudgingly accept it.
I'd be curious to see what others have to say about this style
numpy/lib/tests/test_arraysetops.py
Outdated
There was a problem hiding this comment.
Nit: A little inconsistent to spell the first of these np.int16([1, 2, 3]) yet the latter np.array([0, 1, 1], dtype=np.int16) - would be better to pick one style and stick with it. The second is probably more typical
numpy/lib/tests/test_arraysetops.py
Outdated
There was a problem hiding this comment.
Nit: The test name should really indicate that these are invalid /unsafe / forbidden type casts
numpy/lib/tests/test_arraysetops.py
Outdated
There was a problem hiding this comment.
nit: Might be nice to have a comment before each of these testcases explaining why they fail
a79a13d to
6171296
Compare
|
Ok, I tried to address the latest round of revisions. |
|
The original version of this function is from 2005. I'm wondering if it does anything that |
|
@charris Are you suggesting deprecation of |
|
Yes, I am wondering if we should deprecate the function, it even looks out of place in |
|
Ok, I followed up with the mailing list & linked a similar discussion on stack overflow about np.diff vs ediff1d. |
|
Thanks Tyler. |
Fixes #11490.
This is surprisingly tricky, because
ediff1dis subject to certain constraints in the unit tests. And indeed, we'd need to be cautious with any kind of behavior change to fix that issue.Consider
test_ediff1d:If I apply type checking preservation on
zero_elem-> returned value there it will fail because the prepend and expected test objects are bothint64. Maybe int->float promotion should be allowed but the reverse (which happens in the linked issue) prohibited?I thought an alternative solution might be to offload type promotion handling to
np.append(), but that will fail unit tests because we do sometimes check for type passthrough on the inputaryobject:I wonder why
ediff1deven has thoseto_beginandto_endarguments--is there actually an explicit intention to have this behave differently from i.e.,append()as far as type handling goes? It looks like this may be cut out for matrix handling in particular maybe?The docstring for
ediff1disn't so clear on matters of type handling / casting / promotion as far as I can tell, and the current unit tests don't necessarily steer me toward any single clear solution so maybe I need a bit of guidance here.