Skip to content

1.20.0 RC: change in array coercion for scalar objects with __array_interface__ #17965

@jorisvandenbossche

Description

@jorisvandenbossche

The Shapely package defines a set of scalar geometry objects, but some of those objects also expose its underlying data using the array interface.

Up to now, np.array([..], dtype=object) coercion seemingly didn't check for an __array_interface__ on the elements in the list, so that code like the following to create an object array of such elements worked:

In [1]: from shapely.geometry import LineString

In [2]: line = LineString([(0,0), (1,1), (2,1)])

In [3]: np.array([line], dtype=object)
Out[3]: 
array([<shapely.geometry.linestring.LineString object at 0x7f0e4e41caf0>],
      dtype=object)

In [4]: np.__version__
Out[4]: '1.19.2'

but with the 1.20.0 RC, we now get the following:

In [3]: np.array([line], dtype=object)
Out[3]: 
array([[[0.0, 0.0],
        [1.0, 1.0],
        [2.0, 1.0]]], dtype=object)

In [4]: np.__version__
Out[4]: '1.20.0rc1'

I suppose this might be an intentional change (https://numpy.org/devdocs/release/1.20.0-notes.html#array-coercion-restructure, #16200). But still wanted to raise this, so you are at least aware of it (the release notes say "We are not aware of any such case").

This can be reproduced without shapely with the following example:

class Line:
    def __init__(self, coords):
        self._coords = np.asarray(coords)
    @property
    def __array_interface__(self):
        return self._coords.__array_interface__

line = Line([(0,0), (1,1), (2,1)])
np.array([line], dtype=object)

Although for Shapely itself there isn't any direct broken behaviour (no failing tests with 1.20.0 RC), everybody who is putting Shapely geometries in arrays will get troubles. And specifcially, the GeoPandas package is doing this a lot.

Some notes:

  • I know that having an __array_interface__ on an object that doesn't pretend to be an array, is probably bad practice (it's still a way to expose its data to other libraries using numpy). This is long-standing behaviour of Shapely, but something we are actually planning to deprecate and remove in the near future (among other things, exactly because of playing better together with numpy).
    We are actively working on a "Shapely 2.0" which will remove this array interface (see this Shapely 2.0 RFC section about this aspect). However, a final release for this is at a minimum a few months away.

  • In addition to having an array_interface, a subset of the Shapely geometries are also iterable. For this reason, in GeoPandas, we have already been quite consistently using this pattern to create numpy arrays:

    arr = np.empty(length, dtype=object)
    arr[:] = [... list of geometries ...]
    

    With this pattern, the issue described here can be avoided. However, we didn't yet do this everywhere in GeoPandas (eg in places we were sure we only had simple geometries like points, which are not iterable), so we still get some failures.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions