Integrate tiered spatial join with existing SqlQueryRunner by JWileczek · Pull Request #1325 · locationtech/geowave

JWileczek · 2018-04-19T11:48:27Z

Ended up pulling out the logic I was working on attempting to handle compound conditionals after realizing how limited the "working" logic I had was and that it couldn't hope to properly support compound conditional statements that have parenthesis order applied to the logic etc..
Ultimately, this integration method is fairly limited and doesn't work exactly as I theorized due to how Spark plans the sql query execution from the string statement. I was hoping to take a query like this:
"select hail.* from hail, tornado where GeomIntersects(hail.geom,tornado.geom)"
Perform the necessary spatial join beforehand (which I can successfully do) then take the results and overwrite the original views with the new joined frames. Afterwards, removing that where clause and feeding any remaining sql commands to be run back to the original query executor.
"select hail.* from hail, tornado"
Instead, the conditional must remain in the sql statement which I believe means spark will attempt to regenerate the output hashes for the join between the tables, but the generated pairs will already exist on the machine from the join beforehand so it should cache most if not all matching pairs.

Performing this has also put another potential issue/short-coming of the spatial join logic on my radar being how to handle join operations with NOT logic present in the condition.
"select hail.* from hail, tornado where !GeomIntersects(hail.geom,tornado.geom)"
or
"select hail.* from hail, tornado where GeomIntersects(hail.geom,tornado.geom) == false"
or
"select hail.* from hail, tornado where GeomIntersects(hail.geom,tornado.geom) !=(<>) true"
Attempting to perform this logic will almost certainly produce incorrect results because the join currently assumes you want to check that the predicates return is 'true'. This attempted query breaks the join logic because it requires you to exclude anything that successfully passes the filter from the final results.
I was thinking of a potential easy workaround for this being an option that signifies a negativePredicateTest which I would do the normal algorithm but when I go to join the results to the original sets to get the final features I would instead do a subtraction of the RDDs
leftResults = leftOriginalRDD.subtractByKey(combinedResults)
Assuming combinedResults contains only the rowKeys you want to remove this logic should be sound, but I wasn't sure it would work with all predicates? Then there are predicates like GeomDisjoint which technically needs a negative predicate test, but should still check if the predicate itself returns 'true'. I could use some assistance thinking about my API and how to better tie some of these operation dependent ends together without creating a maintenance/extension nightmare for anyone adding new GeomPredicates.

JWileczek · 2018-04-21T15:43:29Z

Over the weekend I had a mental breakthrough moment regarding how I wanted to refactor and connect all the pieces for the Spark project API. I've made good progress toward that goal and almost have everything back to working state with a few exceptions. I'm trying to figure out how to best apply IndexStrategies to the new GeoWaveIndexedRDD class, but running into issues with Serialization.
Overall I think this API is much cleaner, easier to understand, and should be considerable more extensible than the previous. There were a couple core changes I made, primarily to the visibility of some functions within the SpatialTypeProviders that I felt were safe to make but if there are more appropriate ways to do the changes I made please let me know.
I didn't expect to make such a big overhaul in this PR, but everything clicked rather suddenly and I just started coding without considering the branch I was still working on.

rfecher

minor change to consolidate code

rfecher · 2018-04-24T12:22:28Z

 		return hasLat && hasLon;
 	}

+	public static boolean isSpatial(


can we just delegate isSpatial(PrimaryIndex) to call isSpatial(NumericIndexStrategy) with one extra null check for the index instead of duplicate code?

rfecher · 2018-04-24T12:26:54Z

 		return hasTime && hasLat && hasLon;
 	}
+
+	public static boolean isSpatialTemporal(


same comment as isSpatial

JWileczek requested a review from rfecher April 19, 2018 11:48

Integrate tiered spatial join with existing SqlQueryRunner

6783ca4

JWileczek force-pushed the sql-join branch from 85b1e90 to 6783ca4 Compare April 19, 2018 13:24

JWileczek force-pushed the sql-join branch 2 times, most recently from 107b190 to b877e8b Compare April 23, 2018 17:12

rfecher requested changes Apr 24, 2018

View reviewed changes

JWileczek force-pushed the sql-join branch from b877e8b to 0c8c4cb Compare April 24, 2018 19:34

Core Spark Interface Refactoring changes

59e2da7

JWileczek force-pushed the sql-join branch from 0c8c4cb to 59e2da7 Compare April 25, 2018 18:52

rfecher approved these changes Apr 25, 2018

View reviewed changes

rfecher merged commit 00c9096 into locationtech:master Apr 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate tiered spatial join with existing SqlQueryRunner#1325

Integrate tiered spatial join with existing SqlQueryRunner#1325
rfecher merged 2 commits intolocationtech:masterfrom
JWileczek:sql-join

JWileczek commented Apr 19, 2018

Uh oh!

JWileczek commented Apr 21, 2018

Uh oh!

rfecher left a comment

Uh oh!

rfecher Apr 24, 2018

Uh oh!

rfecher Apr 24, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JWileczek commented Apr 19, 2018

Uh oh!

JWileczek commented Apr 21, 2018

Uh oh!

rfecher left a comment

Choose a reason for hiding this comment

Uh oh!

rfecher Apr 24, 2018

Choose a reason for hiding this comment

Uh oh!

rfecher Apr 24, 2018

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants