Skip to content

POC Point subclass#104

Closed
caspervdw wants to merge 3 commits intopygeos:masterfrom
caspervdw:poc-subclass
Closed

POC Point subclass#104
caspervdw wants to merge 3 commits intopygeos:masterfrom
caspervdw:poc-subclass

Conversation

@caspervdw
Copy link
Copy Markdown
Member

This makes pygeos.Geometry subclassable and uses tp_base to quickly check if an object is a Geometry subclass.

char get_geom(GeometryObject *obj, GEOSGeometry **out) {
if (!PyObject_IsInstance((PyObject *) obj, (PyObject *) &GeometryType)) {
PyTypeObject *type = ((PyObject *)obj)->ob_type;
if ((type != &GeometryType) & (type->tp_base != &GeometryType)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change actually needed?
Because I experimented with this before (adding the TPFLAGS_BASETYPE), and I seemed to remember the ufuncs worked on the subclasses without any additional change (at least in python, "isinstance" works for subclasses as well, but don't know if the semantics of the C API version is exactly the same).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyObject_IsInstance needs the GIL while the new typecheck doesn't (it operates on structs)

Looking further into the source, it seems that the "exact type match" (which we always have in current master) is catched very soon in the PyObject_IsInstance (here and here) so that's why we probably got away with it in the first place.

@caspervdw
Copy link
Copy Markdown
Member Author

This now also creates the Points from the ufuncs. I didn't see any performance degradation (but more testing is needed still)

@caspervdw
Copy link
Copy Markdown
Member Author

caspervdw commented Mar 14, 2020

Some better benchmarks:

import pygeos
import numpy as np
c = np.random.random((10000, 2))
%timeit pygeos.points(c)
2.72 ms ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # master
3.06 ms ± 29.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # PR
g = box(0, 0, 1, 1)
%timeit pygeos.intersection(g, p)
139 ms ± 419 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  # master
139 ms ± 464 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  # PR

So, you can measure the difference, but it is only 0.033 nanomicroseconds of additional overhead per geometry construction. I think we an take that performance hit, as the benefit to the pygeos API would be great.

@jorisvandenbossche
Copy link
Copy Markdown
Member

This is really cool. Will try to test it a bit more myself as well the coming days!

@jorisvandenbossche
Copy link
Copy Markdown
Member

I also did some similar timings with my laptop tuned for performance checking with pyperf, and with the basic pygeos.points as you did above (which should have little other overhead), I can confirm similar results:
For me, it gives around 33% overhead (338ms instead of 253ms for 1 million points, but again, constructing points is very cheap, in actual operations when calculating eg an intersection, the relative cost will be much smaller). For me this gives 85ns overhead per geometry (compared to your 33ns, the 0.033ns you noted should be 0.033µs).

So I agree that, although measurable, this limited slowdown is certainly something that seems worth it.

@jorisvandenbossche
Copy link
Copy Markdown
Member

And as additional test, I quickly added a LineString subclass, to test something more "costly". Creating linestrings of each 100 coordinates pairs from a numpy array, I get a similar overhead per geometry (~67ns), but now this means only a relative performance hit of 1.3%

@caspervdw
Copy link
Copy Markdown
Member Author

Ok so let’s do this! I think we should copy as much of the shapely API as we can.

Maybe 1 PR per type?

@jorisvandenbossche
Copy link
Copy Markdown
Member

I was experimenting a bit more (also in Shapely), and ran into the following issue: if we have an additional class in the hierarchy (which we will want, as a large amount of the properties and methods are shared accross all geometry types), you get errors.

Eg dummy example:

class BaseGeometry(Geometry):
    @property
    def type_id(self):
        return get_type_id(self)


class Point(BaseGeometry):
    @property
    def x(self):
        return get_x(self)

    @property
    def y(self):
        return get_y(self)


class LineString(BaseGeometry):
    @property
    def num_points(self):
        return get_num_points(self)


lib.registry[0] = Point
lib.registry[1] = LineString

Then the geometry class recognition does not work:

In [2]: p = pygeos.points(0, 1) 

In [3]: p.x  
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-7d725ae8e4e9> in <module>
----> 1 p.x

~/scipy/repos/pygeos/pygeos/geometry.py in x(self)
     47     @property
     48     def x(self):
---> 49         return get_x(self)
     50 
     51     @property

~/scipy/repos/pygeos/pygeos/geometry.py in get_x(point)
    239     nan
    240     """
--> 241     return lib.get_x(point)
    242 
    243 

TypeError: One of the arguments is of incorrect type. Please provide only Geometry objects.

I suppose some check with tp_base needs to adjusted for this, but didn't yet look in detail at it.

@jorisvandenbossche
Copy link
Copy Markdown
Member

jorisvandenbossche commented Apr 18, 2020

Found this in the cython generated code:

int __Pyx_InBases(PyTypeObject *a, PyTypeObject *b) {
    while (a) {
        a = a->tp_base;
        if (a == b)
            return 1;
    }
    return b == &PyBaseObject_Type;
}

so using this logic instead of type->tp_base != &GeometryType in get_geom fixes it (if we know the exact class hierarchy, we can of course also simplify it more)

@jorisvandenbossche
Copy link
Copy Markdown
Member

Ok so let’s do this! I think we should copy as much of the shapely API as we can.
Maybe 1 PR per type?

I am not sure what is best, but I am currently experimenting with directly doing this in Shapely (https://github.com/Toblerity/Shapely/compare/master...jorisvandenbossche:geometry-subclasses?expand=1).
We need to decide in Shapely how we want to approach this (start somewhat fresh inside the repo, as you proposed), or more gradually. But in any case, I would personally mostly focus on Shapely when it comes to the subclasses.

@caspervdw
Copy link
Copy Markdown
Member Author

Work is resuming in #182

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants