Skip to content

API: exposing CoordinateSequence in Shapely 2.0? #984

@jorisvandenbossche

Description

@jorisvandenbossche

One of the questions I run into while doing the Geometry classes refactor (subclassing PyGEOS geometry -> #983), is what to do with CoordinateSequence.

So right now, if you access the .coords of a Shapely Geometry you get a CoordinateSequence object (at least for Point, LineString and LinearRing, the geometry types that are directly backed by a coordinate sequence).

Some examples usages that are possible now:

>>> from shapely.geometry import LineString  
>>> geom = LineString([(0, 0), (0, 1), (1, 1)])
>>> print(geom) 
LINESTRING (0 0, 0 1, 1 1)

>>> geom.coords   
<shapely.coords.CoordinateSequence at 0x7f8663bd5438>

# slicing gives a list of tuples (this is done often in the tests)
>>> geom.coords[:]   
[(0.0, 0.0), (0.0, 1.0), (1.0, 1.0)]

# accessing single element gives (x, y) or (x, y, z) tuple
>>> geom.coords[0] 
(0.0, 0.0)

# iterating over coordinate tuples
>>> for x, y in geom.coords: 
...     print(x, y) 
0.0 0.0
0.0 1.0
1.0 1.0

# the xy attribute gives two 1D arrays (from the stdlib array module)
>>> geom.coords.xy
(array('d', [0.0, 0.0, 1.0]), array('d', [0.0, 1.0, 1.0]))

# convert to a 2D numpy array
>>> np.asarray(geom.coords) 
array([[0., 0.],
       [0., 1.],
       [1., 1.]])

Apart from the exact API (eg tuples and not numpy arrays), the current implementation also ensures you can lazily iterate over the coordinates or get a specific coordinate pair (instead of needing to convert all coordiantes of the full Geometry to a list/array first).

For Shapely 2.0, I think there are some different options (from most to least compatible with current shapely):

  1. We also implement a CoordinateSequence extension type that maps to GEOSCoordSequence (like pygeos.Geometry wraps a GEOSGeometry), so that we can keep the exact same interface and the lazy conversion aspect.
    However, this also gives quite some additional code complexity (implementing yet another extension type in C, although the code itself will be quite similar as the Geometry extension type).

  2. We keep the python CoordinateSequence class for the return value of the .coords property to mimic the current API, but we let it hold the full coordinates array instead of a GEOCoordSequence object (so when accessing geom.coords, we get all the coordinates of the geometry as a numpy array, and put that in a CoordinateSequence wrapper class). This can preserve the API, but loses the lazy aspect.

  3. We simply return the numpy array of coordinates (what you would get now with np.asarray(geom.coords)) instead of having the CoordinateSequence class. Behavioural consequence: we need to deprecate the xy attribute (but you can of course slice the 2D array to get 2 1D arrays), and accessing coordinates or iterating over the coords will now give a numpy array instead of a tuple for coordinate pairs (eg np.array([0., 0.]) instead of (0.0, 0.0)).
    Potential option 3b) would be to convert the numpy array to a list of tuples, and return that from .coords. This preserves the behaviour of getting a tuple for coordinate pairs in getitem/iteration. But on the other hand, makes getting an array of coordinates of a large Geometry less efficient.

Personally, I am not sure how important the lazy aspect is (the conversion to a numpy array of coordinates will also be more efficient with pygeos).
In principle, we could also implement a "get nth coordinate pair" function in C that works directly on a supported Geometry instead of a CoordinateSequence, and that way we could keep a lazy iteration also in option 2, while still have the efficient conversion to a full numpy array of coordinates and without needing a CoordinateSequence extension type.

For now, in my refactor of the Geometry classes (#983), I implemented option 2 as this was the easiest to do while getting existing tests passing.

cc @caspervdw @sgillies @mwtoews

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions