Skip to content

genfromtxt fails when a non-contiguous dtype is requested #19623

@anntzer

Description

@anntzer

Reproducing code example:

import numpy as np, io
# (np.dtype([("a", int), ("b", float), ("c", int)])[["a", "c"]]) constructs a non-contiguous dtype with
# the a and c fields)
np.loadtxt(io.StringIO("1 3\n5 7"), dtype=np.dtype([("a", int), ("c", int)]))  # OK
np.loadtxt(io.StringIO("1 3\n5 7"), dtype=np.dtype([("a", int), ("b", float), ("c", int)])[["a", "c"]])  # OK
np.genfromtxt(io.StringIO("1 3\n5 7"), dtype=np.dtype([("a", int), ("c", int)]))  # OK
np.genfromtxt(io.StringIO("1 3\n5 7"), dtype=np.dtype([("a", int), ("b", float), ("c", int)])[["a", "c"]])  # fails

Error message:

/usr/lib/python3.9/site-packages/numpy/lib/npyio.py in genfromtxt(fname, dtype, comments, delimiter, skip_header, skip_footer, converters, missing_values, filling_values, usecols, names, excludelist, deletechars, replace_space, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose, invalid_raise, max_rows, encoding, like)
   2220             else:
   2221                 rows = np.array(data, dtype=[('', _) for _ in dtype_flat])
-> 2222                 output = rows.view(dtype)
   2223             # Now, process the rowmasks the same way
   2224             if usemask:

ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

This occurs because genfromtxt handles dtypes by first "flattening" them (lifting nested fields to the toplevel, but also throwing away alignment info), constructing an array with that flattened dtype, and then .view()ing it with the original dtype; it is that last step that fails as the view cannot be done as the memory layout changes.

(OTOH, loadtxt directly constructs an output with the right dtype by using a recursive "row packer" that constructs a nested list/tuple with the right shape.)

I was hoping to implement a similar "flat-dtype" optimization for loadtxt, so that'll likely involve alignment-handling code on both sides.

... or can we just claim that non-contiguous dtypes are not supported by loadtxt/genfromtxt? (I expect the use cases of having to loadtxt() into an array with a specific, non-contiguous alignment to be exceedingly rare; you can always to a copy later if needed, which should have negligible cost compared to the (rather slow) loadtxt.)

NumPy/Python version information:

1.21.1 3.9.6 (default, Jun 30 2021, 10:22:16) 
[GCC 11.1.0]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions