Refactor Geometry classes to subclass the C extension type#983
Conversation
| self._ndim = self.__p__._ndim | ||
| self._cseq = lgeos.GEOSGeom_getCoordSeq(self.__p__._geom) | ||
| def __init__(self, coords): | ||
| self._coords = coords |
There was a problem hiding this comment.
What exactly to do with this CoordinateSequence class is something still to be discussed, and I opened #984 for that
There was a problem hiding this comment.
I have been reading up on #984 and I tend to go with this implementation for this PR (making self._coords a ndarray directly)
However I think that sticking a full duplicate of coords to the object by default is a bit too much. Also _coords won't be initialized properly in all construction paths.
Within this approach (I actually like the lazy CoordinateSequency object approach better) I think it is best to use a pattern like this:
@property
def coords(self):
try:
coords = getattr(self, "_coords")
except AttributeError:
coords = pygeos.get_coordinates(self)
coords.flags.writeable = False
self._coords = coords
return coords
There was a problem hiding this comment.
However I think that sticking a full duplicate of coords to the object by default is a bit too much.
Note that this CoordinateSequence is not constructed by default when creating a geometry object. It will only be created once a user accesses the geom.coords attribute.
Now, with the current code, each time the user does geom.coords, it will create a new CoordinateSequence object containing a new array, so making this "cached" is certainly a good idea (as well as making the array not writable)
| else: | ||
| self._geom, self._ndim = geos_geometrycollection_from_py(geoms) | ||
| # TODO better empty constructor | ||
| return pygeos.from_wkt("GEOMETRYCOLLECTION EMPTY") |
There was a problem hiding this comment.
For all those constructors, I now use WKT to create an empty geometry, but see pygeos/pygeos#149 to discuss if we need to have a better way for this
There was a problem hiding this comment.
pygeos.geometrycollections([]) gives you the empty geometry collection also
| with pytest.raises(ValueError) as exc: | ||
| wkt = shape(geojson).wkt | ||
| assert exc.match("Inconsistent coordinate dimensionality") | ||
| assert exc.match("Inconsistent coordinate dimensionality|Input operand 0 does not have enough dimensions") |
There was a problem hiding this comment.
We get different error message in several places (we should see to what extent we want to preserve them, sometimes they are more informative though)
| from tests.conftest import shapely20_todo | ||
|
|
||
|
|
||
| @shapely20_todo # logging is not yet implemented |
There was a problem hiding this comment.
This is something we need to check in pygeos how this can be possible, if it's a feature we want to preserve
There was a problem hiding this comment.
I'd say all error messages from GEOS (like the one below) should be converted to exception messages, similar to #991 (comment) . I'd assume it would be difficult to capture an error message in pygeos, but also emit the same error message to the logger handler below. I wouldn't expect many other folks to be relying on these log capture methods.
tests/test_persist.py
Outdated
| q = pickle.loads(data) | ||
| self.assertTrue(q.equals(p)) | ||
|
|
||
| @shapely20_todo # big_endian is not yet implemented |
There was a problem hiding this comment.
In pygeos right now we didn't implement this keyword, but have a byte_order keyword instead (matches more closely the terminology of GEOS). The default is to use the machine's endianness, and one can use the keyword to override this.
| data = dumps(geom1, HIGHEST_PROTOCOL) | ||
| geom2 = loads(data) | ||
| assert geom2.has_z == geom1.has_z | ||
| # TODO(shapely-2.0) LinearRing type not preserved in roundtrip |
There was a problem hiding this comment.
We were just discussing this in PyGEOS as well -> pygeos/pygeos#190
There we thought that it might be OK enough to get a LineString back from pickling a LinearRing. However, having the pickling on the python side (still using WKB), we could more easily track the original class as well and ensure it is correctly preserved.
However, having it in python compared to in the base C extension type has a performance penalty.
| (459, 415), (399, 381), (519, 311), (520, 242), (519, 173), | ||
| (399, 450), (339, 207)] | ||
| self.assertRaises(TopologicalError, Polygon(p1).within, Polygon(p2)) | ||
| self.assertRaises(pygeos.GEOSException, Polygon(p1).within, Polygon(p2)) |
There was a problem hiding this comment.
Currently, shapely has a bunch of custom error classes in https://github.com/Toblerity/Shapely/blob/master/shapely/errors.py. But in PYGEOS, for now we just have a single GEOSException for errors that are directly raised from one of the GEOS functions.
There was a problem hiding this comment.
Raising pygeos exceptions is just an intermediate state of the API then?
There was a problem hiding this comment.
Yes, the fact that it is a "pygeos" exception is intermediate state. But basically everywhere you see pygeos you need to mentally replace it with shapely (as in one of the next steps we will move pygeos into shapely and replace the pygeos imports with internal shapely imports).
So the "real" question here is if we are, long term, fine with having only a generic shapely.GEOSException, or if we want to have more fine grained error classes (TopologicalError, WKBReadingError, WKTReadingError).
Note that this is only about the class of the exception, also the generic GEOSException error class will still have the custom, informative error messages.
There was a problem hiding this comment.
Opened #991 to have a dedicated place to discuss this topic.
|
Gentle ping to get some feedback here. |
|
Thanks @jorisvandenbossche this looks great so far. I've built this PR locally, and the majority of the tests indeed pass. I'll drop a few review comments when I get spare moments... |
|
I won't be able to try this out until the weekend. Will report back then. |
caspervdw
left a comment
There was a problem hiding this comment.
@jorisvandenbossche Thanks for this enormous effort of combining pygeos and shapely. I left some remarks.
| @@ -838,32 +765,19 @@ class GeometrySequence(object): | |||
| _ndim = None | |||
There was a problem hiding this comment.
I believe that these class-level attributes are not used?
There was a problem hiding this comment.
__p__ is used below, but I think this would be a good time to rename this private variable to _parent
There was a problem hiding this comment.
Specifically for this GeometrySequence class, we also still need to see what to do with it (keep it or not?)
| return pygeos.get_geometry(self.__p__, i) | ||
|
|
||
|
|
||
| class EmptyGeometry(BaseGeometry): |
There was a problem hiding this comment.
The term EmptyGeometry doesn't really cover the fact that GEOS has 'typed' empty geometries.
As `GeometryCollection([]) gives this exact same result, I think this class should best be deprecated.
| ob._ndim = 2 | ||
| ob._is_empty = False | ||
| return ob | ||
| ob = _DummyGeometry(g) |
There was a problem hiding this comment.
This seems backwards to me.
In my opinion we should never pass these pointers around as python int objects. Do you think we can remove the geom_factory completely from shapely 2.0?
There was a problem hiding this comment.
Yes, this is certainly only temporary. I don't refactor the full of Shapely in a single PR, and thus in other parts of shapely, geom_factory is still used together with ctypes. So those changes here with the DummyGeometry hack is just to ensure geom_factory keeps working. But the end goal of Shapely 2.0 for sure means that all usage of geom_factory is eliminated.
This was an old hack to run the tests with the older library and compare similar WKT, so yes we can update the WKT and remove it. |
|
Skipped the problematic tests on Appveyor, and updated the WKT/WKB related tests (to no longer be skipped).
For now I added |
|
Indeed, already removed it ;) |
|
OK, going to merge this, so we can further work in separate follow-up PRs. Additional comments on the changes here are of course still welcome as well, and I will open issues to track the currently remaining discussion items. Thanks all for the review! |
|
I created some more follow-up issues based on the review discussions in this PR:
And other outstanding issues for which I already created issues earlier:
|
.. and use the pygeos functions for all the Geometry methods/properties.
This is a first step for integrating PyGEOS into Shapely (#962):
geom_factory(self.impl['difference'](self, other))topygeos.difference(self, other)). For other functionality in most other modules (eg shapely.ops etc), this is still left as is and keeps using the ctypes interface (this can be cleaned up later in a separate PR)cc @caspervdw