Skip to content

DEP: Do not allow "abstract" dtype conversion/creation#15534

Merged
mattip merged 3 commits intonumpy:masterfrom
seberg:deprecate-abstract-scalar-types
Mar 5, 2020
Merged

DEP: Do not allow "abstract" dtype conversion/creation#15534
mattip merged 3 commits intonumpy:masterfrom
seberg:deprecate-abstract-scalar-types

Conversation

@seberg
Copy link
Member

@seberg seberg commented Feb 7, 2020

These dtypes do not really make sense as instances. We can (somewhat)
reasonably define np.dtype(np.int64) as the default (machine endianess)
int64. (Arguably, it is unclear that np.array(arr_of_>f8, dtype="f")
should return arr_of_<f8, but that would be very noisy!)

However, np.integer as equivalent to long, is not well defined.
Similarly, dtype=Decimal may be neat to spell dtype=object when you
intend to put Decimal objects into the array. But it is misleading,
since there is no special meaning to it at this time.

The biggest issue with it, is that arr.astype(np.floating) looks
like it will let float32 or float128 pass, but it will force a
float64 output! Arguably downcasting is a bug in this case.

A related issue is np.dtype("S") and especially "S0". The dtype "S"
does make sense for most or all places where dtype=... can be
passed. However, it is conceptionally different from other dtypes, since
it will not end up being attached to the array (unlike "S2" which
would be). The dtype "S" really means the type number/DType class
of String, and not a specific dtype instance.


I have a pretty strong opinion, that at least these super classes should be deprecated. I can be waved about the object case, although it seems very confusing to accept it.
The conversion of S0, and np.dtype("S") is another point, which I do not accept as correct, but requires a new converter with special handling for "S" and np.str_ to fix. (Honestly, I think the above example of np.array(np.array(..., dtype=">f8"), dtype=np.float64).dtype == "<float64" is unclear to be correct logic, but... it is convenient logic)

@seberg seberg added 07 - Deprecation component: numpy.dtype 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. labels Feb 7, 2020
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most annoying part of the change. I think it is OK, basically... This only gets called when the dtype is was already fixed, in which case the dtype is not used, but only the converters is found.

Copy link
Member

@eric-wieser eric-wieser May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out this is not ok, this has broken genfromtxt with dtype object:

Before:

>>> np.genfromtxt(io.StringIO("1"), dtype=object)
array(b'1', dtype=object)
>>> np.genfromtxt(io.StringIO("what"), dtype=object)
array(b'what', dtype=object)

After:

>>> np.genfromtxt(io.StringIO("1"), dtype=object)
array((1+0j), dtype=object)
>>> np.genfromtxt(io.StringIO("what"), dtype=object)
array(None, dtype=object)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, subtle, I guess string being the last entry is important, probably switching the order will be OK, but have to look...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed as #16189

@seberg seberg force-pushed the deprecate-abstract-scalar-types branch from 6d3511e to 13556f4 Compare February 7, 2020 01:03
@seberg seberg added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Feb 7, 2020
These dtypes do not really make sense as instances. We can (somewhat)
reasonably define np.dtype(np.int64) as the default (machine endianess)
int64. (Arguably, it is unclear that `np.array(arr_of_>f8, dtype="f")`
should return arr_of_<f8, but that would be very noisy!)

However, `np.integer` as equivalent to long, is not well defined.
Similarly, `dtype=Decimal` may be neat to spell `dtype=object` when you
intend to put Decimal objects into the array. But it is misleading,
since there is no special meaning to it at this time.

The biggest issue with it, is that `arr.astype(np.floating)` looks
like it will let float32 or float128 pass, but it will force a
float64 output! Arguably downcasting is a bug in this case.

A related issue is `np.dtype("S")` and especially "S0". The dtype "S"
does make sense for most or all places where `dtype=...` can be
passed. However, it is conceptionally different from other dtypes, since
it will not end up being attached to the array (unlike "S2" which
would be). The dtype "S" really means the type number/DType class
of String, and not a specific dtype instance.
@seberg seberg force-pushed the deprecate-abstract-scalar-types branch from 13556f4 to 1a1611a Compare February 7, 2020 04:10
@seberg seberg removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Feb 7, 2020
@seberg
Copy link
Member Author

seberg commented Feb 7, 2020

Just to note, I checked scipy for np.integer occurance, and it looks like at most one or two things where it is used incorrectly. NumPy had more. A few of the "object" changes, are arguably unintended, the author probably expected to get an array of Decimal not integer objects. So when it comes to libraries, I actually do not expect it to be painful at all. And when it comes to users: We do not have to change it tomorrow...

@seberg seberg force-pushed the deprecate-abstract-scalar-types branch from 714a4ee to 51fb5d7 Compare February 7, 2020 21:34
Add a comment that we can think about only allowing it for `dtype=...`
though...
@seberg seberg force-pushed the deprecate-abstract-scalar-types branch from 51fb5d7 to cefd352 Compare February 7, 2020 21:47
@seberg seberg added triage review Issue/PR to be discussed at the next triage meeting and removed triage review Issue/PR to be discussed at the next triage meeting labels Feb 27, 2020
@seberg
Copy link
Member Author

seberg commented Mar 4, 2020

We discussed this briefly today, and nobody seemed concerned. Also the mailing list discussion did not bring up any serious concerns about the warnings. And I do think we have general agreement that this is the correct behaviour.

@mattip
Copy link
Member

mattip commented Mar 4, 2020

LGTM. A little nit about wording in a c-comment.

Co-Authored-By: Matti Picus <matti.picus@gmail.com>
@seberg
Copy link
Member Author

seberg commented Mar 5, 2020

@mattip restarting it, the PyPy tests seem to fail fairly often right now? This time in:

numpy/core/tests/test_einsum.py ...........Fatal Python error: Illegal instruction

@mattip mattip merged commit ff4cfe7 into numpy:master Mar 5, 2020
@mattip
Copy link
Member

mattip commented Mar 5, 2020

Thanks @seberg. I filed an issue with PyPy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

07 - Deprecation 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. component: numpy.dtype

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants