Skip to content

Converting lists of array-like objects to numpy arrays can be very slow #8562

@shoyer

Description

@shoyer

NumPy seems to attempt to iterate instead of calling __array__.

Consider:

class ArrayLike(object):
  def __init__(self, array):
    self.array = array

  def __len__(self):
    return len(self.array)
  
  def __iter__(self):
    print('calling __iter__')
    return iter(self.array)
  
  def __getitem__(self, index):
    print('calling __getitem__ with index={}'.format(index))
    return self.array[index]

  def __array__(self, dtype=None):
    print('calling __array__')
    return np.asarray(self.array, dtype=dtype)
>>> a = ArrayLike(np.arange(3))

>>> np.array(a)
calling __array__
array([0, 1, 2])

>>> list(a)
calling __iter__
[0, 1, 2]

>>> np.array([a])
calling __array__
calling __iter__
calling __iter__
array([[0, 1, 2]])

So actually, it looks like NumPy calls __array__, and then __iter__ twice!

Is there a good reason for this behavior? If not, maybe we can fix it? Either way, a pointer to the relevant place in the codebase would be appreciated.

This causes trouble for xarray users (pydata/xarray#1247) because iterating over xarray objects creates new xarray objects, which takes ~100 us each. That adds up fast for large arrays.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions