Extracting the individual geometries from a multi geometry / collection is a fairly common task.
In shapely this is available using the geoms attribute on multi / collection geometry object.
There doesn't appear to be any counterpart in GEOS, nor does there need to be: it gives us the count of parts and an accessor given an index (plus, it would be hard to consume an array of geometries from GEOS).
Right now, we have to do this in a Python loop, e.g.,
parts = [pygeos.get_geometry(g, i) for i in range(0, pygeos.get_num_geometries(g))]
Which is certainly not optimal for large numbers of geometries.
There are a number of cases where we would also want to do this for all geometries in an ndarray of source geometries.
A standard ufunc approach doesn't work for this, because the output shape is greater than or equal to the input shape. It also doesn't look like the generalized ufunc expressions get us here either, though I haven't been able to fully get my head around those. Is there a way to do this as a ufunc?
Instead, I think we could expose a new function implemented in C that provides this functionality.
Given a singular geometry or 1D array_like of geometries, which returns a 1D array of all parts:
>>> geom = pg.Geometry("MULTIPOLYGON (((0 0, 0 10, 10 10, 0 0)), ((1 1, 1 10, 10 10, 1 1)))")
>>> parts = pygeos.get_parts(geom)
array([<pygeos.Geometry POLYGON ((0 0, 0 10, 10 10, 0 0))>, <pygeos.Geometry POLYGON ((1 1, 1 10, 10 10, 1 1))>])
For the 1D input case, we may want to know the indexes of the input geometry as well as the parts in order to relate back to the inputs (e.g., for other attributes within a DataFrame).
In that case, we may want to limit pygeos.get_parts() to the single geometry case, and provide pygeos.get_parts_bulk() (better name welcome!! Maybe get_parts_indexed?) which returns an array of shape (2,n) where the first array is indexes into source and the second is array of geometries.
This could leverage some of the existing code used in the ufuncs, so the new code here is mostly in wrapping a bit of C as python functions.
Extracting the individual geometries from a multi geometry / collection is a fairly common task.
In shapely this is available using the
geomsattribute on multi / collection geometry object.There doesn't appear to be any counterpart in GEOS, nor does there need to be: it gives us the count of parts and an accessor given an index (plus, it would be hard to consume an array of geometries from GEOS).
Right now, we have to do this in a Python loop, e.g.,
Which is certainly not optimal for large numbers of geometries.
There are a number of cases where we would also want to do this for all geometries in an ndarray of source geometries.
A standard ufunc approach doesn't work for this, because the output shape is greater than or equal to the input shape. It also doesn't look like the generalized ufunc expressions get us here either, though I haven't been able to fully get my head around those. Is there a way to do this as a ufunc?
Instead, I think we could expose a new function implemented in C that provides this functionality.
Given a singular geometry or 1D array_like of geometries, which returns a 1D array of all parts:
For the 1D input case, we may want to know the indexes of the input geometry as well as the parts in order to relate back to the inputs (e.g., for other attributes within a DataFrame).
In that case, we may want to limit
pygeos.get_parts()to the single geometry case, and providepygeos.get_parts_bulk()(better name welcome!! Maybeget_parts_indexed?) which returns an array of shape (2,n) where the first array is indexes into source and the second is array of geometries.This could leverage some of the existing code used in the ufuncs, so the new code here is mostly in wrapping a bit of C as python functions.