Skip to content

[Done] Separating Missing (NaG) and Empty#36

Merged
caspervdw merged 8 commits intomasterfrom
missingvalues
Sep 9, 2019
Merged

[Done] Separating Missing (NaG) and Empty#36
caspervdw merged 8 commits intomasterfrom
missingvalues

Conversation

@caspervdw
Copy link
Copy Markdown
Member

@caspervdw caspervdw commented Sep 2, 2019

Closes #28, closes #29

This is some work separating the concepts of missing geometries (not-a-geometry or NaG) values and empty geometries. The idea is to keep a single NaG object (like None) that has its GEOSGeometry pointer set to NULL.

All geometry-returning functions propagate the NaG; predicates return False for any import NaG; integer returning functions return -1 for NaG.

I added two additional functions to identify NaG values: is_geometry and is_null.

Further there should be functions that create arrays of empty geometries.

@caspervdw caspervdw changed the title WIP On separating Missing and Empty [Done] Separating Missing (NaG) and Empty Sep 2, 2019
@jorisvandenbossche
Copy link
Copy Markdown
Member

Cool! I will try to take a look at it one of the coming days. Currently at EuroScipy conference, so a bit busy, but, I will sit together here with one of the numpy developers who is interested in seeing the dtype stuff :-)

Copy link
Copy Markdown
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a bunch of suggestions, but this is great work!

One thing I was wondering, you now use a GeometryObject where the pointer is a NULL pointer, but alternatively, we could also directly use a NULL pointer in the array (instead of a pointer to a GeometryObject with NULL pointer in it).
I think in practice this would give None in the python interface (so None instead of the custom pygeos.NaG in the user interface). Just a thought, not convinced myself this is necessarily better.

I added two additional functions to identify NaG values: is_geometry and is_null

Additionally to those, it would alos be useful to have a fast ufunc exposed to just check that the input only contains "valid objects" (in pygeos array's context, so a geometry or a NaG) as well. So basically a is_geometry(a) | is_null(a), but as single ufunc this can be more efficient.



def is_null(geometry, **kwargs):
"""Returns True if the object is not a geometry (NaG)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: could also call this as a more generic is_missing, to avoid a specific choice between NA or NULL (eg in pandas nowadays we are standardizing on NA, at least in the method names)

@caspervdw
Copy link
Copy Markdown
Member Author

Thanks for the remarks, I made some changes:

  • Dropped the NaG and NaN support in favour of only None
  • is_null is now replaced by is_missing (which is just np.equals(arr, None) under the hood)

@caspervdw
Copy link
Copy Markdown
Member Author

Additionally to those, it would alos be useful to have a fast ufunc exposed to just check that the input only contains "valid objects" (in pygeos array's context, so a geometry or a NaG) as well. So basically a is_geometry(a) | is_null(a), but as single ufunc this can be more efficient.

Maybe good to wait to see how this numpy-dtype extension thing plays out?

@caspervdw caspervdw added this to the 0.5 milestone Sep 8, 2019
@jorisvandenbossche
Copy link
Copy Markdown
Member

Maybe good to wait to see how this numpy-dtype extension thing plays out?

On the short term, it unfortunately won't play out. See the discussion in #34. It will need changes in numpy to be feasible (but changes they would welcome / want to plan).

<pygeos.Geometry LINESTRING (0 0, 1 0, 1 1, 0 1, 0 0)>
>>> boundary(Geometry("MULTIPOINT (0 0, 1 2)"))
<pygeos.NaG>
>>> boundary(Geometry("MULTIPOINT (0 0, 1 2)")) is None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative could be to do print(..) which then actually shows None (but both are of course workarounds for None having no repr)

@caspervdw
Copy link
Copy Markdown
Member Author

Maybe good to wait to see how this numpy-dtype extension thing plays out?

On the short term, it unfortunately won't play out. See the discussion in #34. It will need changes in numpy to be feasible (but changes they would welcome / want to plan).

Oh I wasn 't following the discussion. So I implemented 3 ufuncs is_missing, is_geometry, is_valid_input

@caspervdw caspervdw merged commit 3d9e246 into master Sep 9, 2019
@caspervdw caspervdw deleted the missingvalues branch September 9, 2019 11:16
jorisvandenbossche pushed a commit to jorisvandenbossche/shapely that referenced this pull request Nov 29, 2021
[Done] Separating Missing (NaG) and Empty
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handling of empty geometries Handling of missing geometries

2 participants