DEP: Do not allow "abstract" dtype conversion/creation#15534
DEP: Do not allow "abstract" dtype conversion/creation#15534mattip merged 3 commits intonumpy:masterfrom
Conversation
numpy/lib/_iotools.py
Outdated
There was a problem hiding this comment.
This is the most annoying part of the change. I think it is OK, basically... This only gets called when the dtype is was already fixed, in which case the dtype is not used, but only the converters is found.
There was a problem hiding this comment.
Turns out this is not ok, this has broken genfromtxt with dtype object:
Before:
>>> np.genfromtxt(io.StringIO("1"), dtype=object)
array(b'1', dtype=object)
>>> np.genfromtxt(io.StringIO("what"), dtype=object)
array(b'what', dtype=object)After:
>>> np.genfromtxt(io.StringIO("1"), dtype=object)
array((1+0j), dtype=object)
>>> np.genfromtxt(io.StringIO("what"), dtype=object)
array(None, dtype=object)There was a problem hiding this comment.
Ouch, subtle, I guess string being the last entry is important, probably switching the order will be OK, but have to look...
6d3511e to
13556f4
Compare
These dtypes do not really make sense as instances. We can (somewhat)
reasonably define np.dtype(np.int64) as the default (machine endianess)
int64. (Arguably, it is unclear that `np.array(arr_of_>f8, dtype="f")`
should return arr_of_<f8, but that would be very noisy!)
However, `np.integer` as equivalent to long, is not well defined.
Similarly, `dtype=Decimal` may be neat to spell `dtype=object` when you
intend to put Decimal objects into the array. But it is misleading,
since there is no special meaning to it at this time.
The biggest issue with it, is that `arr.astype(np.floating)` looks
like it will let float32 or float128 pass, but it will force a
float64 output! Arguably downcasting is a bug in this case.
A related issue is `np.dtype("S")` and especially "S0". The dtype "S"
does make sense for most or all places where `dtype=...` can be
passed. However, it is conceptionally different from other dtypes, since
it will not end up being attached to the array (unlike "S2" which
would be). The dtype "S" really means the type number/DType class
of String, and not a specific dtype instance.
13556f4 to
1a1611a
Compare
|
Just to note, I checked scipy for |
714a4ee to
51fb5d7
Compare
Add a comment that we can think about only allowing it for `dtype=...` though...
51fb5d7 to
cefd352
Compare
|
We discussed this briefly today, and nobody seemed concerned. Also the mailing list discussion did not bring up any serious concerns about the warnings. And I do think we have general agreement that this is the correct behaviour. |
|
LGTM. A little nit about wording in a c-comment. |
Co-Authored-By: Matti Picus <matti.picus@gmail.com>
|
@mattip restarting it, the PyPy tests seem to fail fairly often right now? This time in: |
These dtypes do not really make sense as instances. We can (somewhat)
reasonably define np.dtype(np.int64) as the default (machine endianess)
int64. (Arguably, it is unclear that
np.array(arr_of_>f8, dtype="f")should return arr_of_<f8, but that would be very noisy!)
However,
np.integeras equivalent to long, is not well defined.Similarly,
dtype=Decimalmay be neat to spelldtype=objectwhen youintend to put Decimal objects into the array. But it is misleading,
since there is no special meaning to it at this time.
The biggest issue with it, is that
arr.astype(np.floating)lookslike it will let float32 or float128 pass, but it will force a
float64 output! Arguably downcasting is a bug in this case.
A related issue is
np.dtype("S")and especially "S0". The dtype "S"does make sense for most or all places where
dtype=...can bepassed. However, it is conceptionally different from other dtypes, since
it will not end up being attached to the array (unlike "S2" which
would be). The dtype "S" really means the type number/DType class
of String, and not a specific dtype instance.
I have a pretty strong opinion, that at least these super classes should be deprecated. I can be waved about the
objectcase, although it seems very confusing to accept it.The conversion of
S0, andnp.dtype("S")is another point, which I do not accept as correct, but requires a new converter with special handling for"S"andnp.str_to fix. (Honestly, I think the above example ofnp.array(np.array(..., dtype=">f8"), dtype=np.float64).dtype == "<float64"is unclear to be correct logic, but... it is convenient logic)