-
-
Notifications
You must be signed in to change notification settings - Fork 12.2k
genfromtxt fails when a non-contiguous dtype is requested #19623
Description
Reproducing code example:
import numpy as np, io
# (np.dtype([("a", int), ("b", float), ("c", int)])[["a", "c"]]) constructs a non-contiguous dtype with
# the a and c fields)
np.loadtxt(io.StringIO("1 3\n5 7"), dtype=np.dtype([("a", int), ("c", int)])) # OK
np.loadtxt(io.StringIO("1 3\n5 7"), dtype=np.dtype([("a", int), ("b", float), ("c", int)])[["a", "c"]]) # OK
np.genfromtxt(io.StringIO("1 3\n5 7"), dtype=np.dtype([("a", int), ("c", int)])) # OK
np.genfromtxt(io.StringIO("1 3\n5 7"), dtype=np.dtype([("a", int), ("b", float), ("c", int)])[["a", "c"]]) # failsError message:
/usr/lib/python3.9/site-packages/numpy/lib/npyio.py in genfromtxt(fname, dtype, comments, delimiter, skip_header, skip_footer, converters, missing_values, filling_values, usecols, names, excludelist, deletechars, replace_space, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose, invalid_raise, max_rows, encoding, like)
2220 else:
2221 rows = np.array(data, dtype=[('', _) for _ in dtype_flat])
-> 2222 output = rows.view(dtype)
2223 # Now, process the rowmasks the same way
2224 if usemask:
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.
This occurs because genfromtxt handles dtypes by first "flattening" them (lifting nested fields to the toplevel, but also throwing away alignment info), constructing an array with that flattened dtype, and then .view()ing it with the original dtype; it is that last step that fails as the view cannot be done as the memory layout changes.
(OTOH, loadtxt directly constructs an output with the right dtype by using a recursive "row packer" that constructs a nested list/tuple with the right shape.)
I was hoping to implement a similar "flat-dtype" optimization for loadtxt, so that'll likely involve alignment-handling code on both sides.
... or can we just claim that non-contiguous dtypes are not supported by loadtxt/genfromtxt? (I expect the use cases of having to loadtxt() into an array with a specific, non-contiguous alignment to be exceedingly rare; you can always to a copy later if needed, which should have negligible cost compared to the (rather slow) loadtxt.)
NumPy/Python version information:
1.21.1 3.9.6 (default, Jun 30 2021, 10:22:16)
[GCC 11.1.0]