Skip to content

Noting ediff1d casting changes in release notes, docs? #12711

@llimeht

Description

@llimeht

As a result of #11490 (ediff1d handles data type errors badly), ediff1d grew some input checks.

These checks are probably a good thing overall, but are also a functional change where code that works OK now gives a TypeError and that causes regressions (pbcore being the one that I'm looking at right now).

Using a variant of ediff1d's own documentation to illustrate:

With numpy 1.12.1:

In [1]: import numpy as np
In [2]: x = np.array([1, 2, 4, 7, 0], dtype=np.int32)
In [3]: np.ediff1d(x, to_begin=-99, to_end=np.array([88, 99]))
Out[3]: array([-99,   1,   2,   3,  -7,  88,  99], dtype=int32)

And with numpy 1.16.0rc2:

In [1]: import numpy as np
In [2]: x = np.array([1, 2, 4, 7, 0], dtype=np.int32)
In [3]: np.ediff1d(x, to_begin=-99, to_end=np.array([88, 99]))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-29beedef979f> in <module>()
----> 1 np.ediff1d(x, to_begin=-99, to_end=np.array([88, 99]))
/usr/lib/python3/dist-packages/numpy/lib/arraysetops.py in ediff1d(ary, to_end, to_begin)
    120         to_end = np.asanyarray(to_end)
    121         if not np.can_cast(to_end, dtype_req):
--> 122             raise TypeError("dtype of to_end must be compatible "
    123                             "with input ary")
    124 
TypeError: dtype of to_end must be compatible with input ary

In [4]: np.ediff1d(x, to_begin=-99, to_end=np.array([88, 99], dtype=np.int32))
Out[4]: array([-99,   1,   2,   3,  -7,  88,  99], dtype=int32)

Quirkily, the way can_cast checks values for array scalars but not for arrays, also results in the following:

In [5]: np.ediff1d(x, to_begin=[-99], to_end=[99])
...
TypeError: dtype of to_begin must be compatible with input ary

In [6]: np.ediff1d(x, to_begin=-99, to_end=99)
Out[6]: array([-99,   1,   2,   3,  -7,  99], dtype=int32)

These last two are both entirely safe and valid code even if they are going through an int64 to int32 cast; the value -99 is preserved in that transformation.

At RC2 stage, I assume that this change is not going to be reverted even if it does break some valid existing code. It would be handy to see this in the release notes and in the ediff1d documentation so that those dealing with failing tests have a useful pointer. The key information to include is:

  • if single values are being passed to to_begin or to_end then pass them as values not lists or arrays
  • if the input array is a narrowed dtype, be explicit about the dtype of the prepended/appended data.

I'd be happy to give you a PR to do that if you want.

Numpy/Python version information:

1.12.1 3.5.3 (default, Sep 27 2018, 17:25:39) 
[GCC 6.3.0 20170516]

vs

1.16.0rc2 3.7.2 (default, Jan  3 2019, 02:55:40) 
[GCC 8.2.0]

(all from Debian packages)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions