Skip to content

PROPOSAL: function to return all parts of a multi-geometry as ndarray #128

@brendan-ward

Description

@brendan-ward

Extracting the individual geometries from a multi geometry / collection is a fairly common task.

In shapely this is available using the geoms attribute on multi / collection geometry object.

There doesn't appear to be any counterpart in GEOS, nor does there need to be: it gives us the count of parts and an accessor given an index (plus, it would be hard to consume an array of geometries from GEOS).

Right now, we have to do this in a Python loop, e.g.,

parts = [pygeos.get_geometry(g, i) for i in range(0, pygeos.get_num_geometries(g))]

Which is certainly not optimal for large numbers of geometries.

There are a number of cases where we would also want to do this for all geometries in an ndarray of source geometries.

A standard ufunc approach doesn't work for this, because the output shape is greater than or equal to the input shape. It also doesn't look like the generalized ufunc expressions get us here either, though I haven't been able to fully get my head around those. Is there a way to do this as a ufunc?

Instead, I think we could expose a new function implemented in C that provides this functionality.

Given a singular geometry or 1D array_like of geometries, which returns a 1D array of all parts:

>>> geom = pg.Geometry("MULTIPOLYGON (((0 0, 0 10, 10 10, 0 0)), ((1 1, 1 10, 10 10, 1 1)))")
>>> parts = pygeos.get_parts(geom)
array([<pygeos.Geometry POLYGON ((0 0, 0 10, 10 10, 0 0))>, <pygeos.Geometry POLYGON ((1 1, 1 10, 10 10, 1 1))>])

For the 1D input case, we may want to know the indexes of the input geometry as well as the parts in order to relate back to the inputs (e.g., for other attributes within a DataFrame).

In that case, we may want to limit pygeos.get_parts() to the single geometry case, and provide pygeos.get_parts_bulk() (better name welcome!! Maybe get_parts_indexed?) which returns an array of shape (2,n) where the first array is indexes into source and the second is array of geometries.

This could leverage some of the existing code used in the ufuncs, so the new code here is mostly in wrapping a bit of C as python functions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions