Skip to content

Release the GIL within the ufuncs where possible#80

Closed
jorisvandenbossche wants to merge 1 commit intopygeos:masterfrom
jorisvandenbossche:nogil
Closed

Release the GIL within the ufuncs where possible#80
jorisvandenbossche wants to merge 1 commit intopygeos:masterfrom
jorisvandenbossche:nogil

Conversation

@jorisvandenbossche
Copy link
Copy Markdown
Member

xref #7

@caspervdw I cleaned up my previous experiment with releasing the GIL (and added handling of None), and pushed it here. This is only doing it for the "YY_d" (eg distance) for testing the approach.

Tested with:

import numpy as np
import pygeos

N = 1_000_000
arr = pygeos.points(np.random.rand(N, 2))
arr[np.random.randint(0, N, size=100)] = None
poly = pygeos.box(0, 0, 0.2, 0.2)

res1 = pygeos.distance(poly, arr)

and it doesn't give a noticeable slowdown in a single thread. Testing it with dask with multi-threading, then it gives a speed-up of around 3x on my 4-core laptop:

import dask.array
arr_dask = dask.array.from_array(arr, chunks=100_000)

res2 = arr_dask.map_blocks(pygeos.distance, poly, dtype=float)
assert np.allclose(res1, res2.compute(), equal_nan=True)
%timeit res2.compute()

Now, some things / questions about the approach:

  • I do a first pass to check the data, and then in a second nogil pass I get the pointers out without checking.
    A problem might be that in the meantime, those objects could have changes, if the array gets modified while the ufunc is running.
  • An alternative is probably what I did in Release the GIL where possible #7 (comment), which is copying the pointers into a new array in a first pass and looping over that in a second nogil pass. However, with the other approach I wanted to avoid the copying of the pointers. Also, in this approach, we will probably need to increase the reference count on all objects to avoid they get released while the nogil loop is running in case someone modifies the array.
  • I still need to check how this works with the geos context and geos exceptions being raised (does each function needs its own context? xref Don't use a global geos context handle #31). To test this, I need to find a code example where GEOS actually raises an error. In principle the GEOSDistance function "Return 0 on exception", but not sure how to trigger this with an example (we also don't have tests for this, only in the IO tests, but for those it is more difficult to release the gil).

Since for both approaches, the fact that the array might get modified while the ufunc is running is problematic, I am wondering if we can do something about that? Like making the array non writeable when passing to the ufunc (but, how does that work with slices and views? How does that work if the same array is passed to multiple ufuncs? ..)


/* Check if the input is a GeometryObject or Py_None. Returns 0 on error, 1 on success. */
char check_geom(GeometryObject *obj) {
if (!PyObject_IsInstance((PyObject *) obj, (PyObject *) &GeometryType)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we would simplify this check to an exact check

if (Py_TYPE(obj) != (PyTypeObject *) GeometryType)

We might be able to do this with the GIL released?

get_geom_nogil(*(GeometryObject **)ip1, &in1);
get_geom_nogil(*(GeometryObject **)ip2, &in2);

if ((in1 == 0) | (in2 == 0)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer == NULL here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants