-
-
Notifications
You must be signed in to change notification settings - Fork 12.2k
Description
As a result of #11490 (ediff1d handles data type errors badly), ediff1d grew some input checks.
These checks are probably a good thing overall, but are also a functional change where code that works OK now gives a TypeError and that causes regressions (pbcore being the one that I'm looking at right now).
Using a variant of ediff1d's own documentation to illustrate:
With numpy 1.12.1:
In [1]: import numpy as np
In [2]: x = np.array([1, 2, 4, 7, 0], dtype=np.int32)
In [3]: np.ediff1d(x, to_begin=-99, to_end=np.array([88, 99]))
Out[3]: array([-99, 1, 2, 3, -7, 88, 99], dtype=int32)
And with numpy 1.16.0rc2:
In [1]: import numpy as np
In [2]: x = np.array([1, 2, 4, 7, 0], dtype=np.int32)
In [3]: np.ediff1d(x, to_begin=-99, to_end=np.array([88, 99]))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-29beedef979f> in <module>()
----> 1 np.ediff1d(x, to_begin=-99, to_end=np.array([88, 99]))
/usr/lib/python3/dist-packages/numpy/lib/arraysetops.py in ediff1d(ary, to_end, to_begin)
120 to_end = np.asanyarray(to_end)
121 if not np.can_cast(to_end, dtype_req):
--> 122 raise TypeError("dtype of to_end must be compatible "
123 "with input ary")
124
TypeError: dtype of to_end must be compatible with input ary
In [4]: np.ediff1d(x, to_begin=-99, to_end=np.array([88, 99], dtype=np.int32))
Out[4]: array([-99, 1, 2, 3, -7, 88, 99], dtype=int32)
Quirkily, the way can_cast checks values for array scalars but not for arrays, also results in the following:
In [5]: np.ediff1d(x, to_begin=[-99], to_end=[99])
...
TypeError: dtype of to_begin must be compatible with input ary
In [6]: np.ediff1d(x, to_begin=-99, to_end=99)
Out[6]: array([-99, 1, 2, 3, -7, 99], dtype=int32)
These last two are both entirely safe and valid code even if they are going through an int64 to int32 cast; the value -99 is preserved in that transformation.
At RC2 stage, I assume that this change is not going to be reverted even if it does break some valid existing code. It would be handy to see this in the release notes and in the ediff1d documentation so that those dealing with failing tests have a useful pointer. The key information to include is:
- if single values are being passed to
to_beginorto_endthen pass them as values not lists or arrays - if the input array is a narrowed dtype, be explicit about the dtype of the prepended/appended data.
I'd be happy to give you a PR to do that if you want.
Numpy/Python version information:
1.12.1 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516]
vs
1.16.0rc2 3.7.2 (default, Jan 3 2019, 02:55:40)
[GCC 8.2.0]
(all from Debian packages)