I think the way the internals of STRtree are implemented now, the tree is based on the bounding box of each geometry, which may be a collection of multiple parts. This means that there is sometimes a wide disparity between the bounding box of that geometry and bounding boxes of its individual parts. This results in a larger number of false-positive hits from the tree for a given query, which slows down predicate evaluation.
The solution in most cases is to "explode" the multi-part geometries into their respective parts before adding to the tree, but this then means that you have to manage the overhead of that second set of geometries, and deduplicate the data after a spatial join - if your goal was simply to do a spatial join between the multi-part geometries and a set of source geometries.
I think we can optimize this a bit by always exploding multi-part geometries before insert to the tree, but retaining the original index and set of geometries. Then in the internals of querying the tree, we would deduplicate back to the original indexes of the tree geometries and return those.
We could make it an optional parameter on the tree to do this optimization.
I think the way the internals of
STRtreeare implemented now, the tree is based on the bounding box of each geometry, which may be a collection of multiple parts. This means that there is sometimes a wide disparity between the bounding box of that geometry and bounding boxes of its individual parts. This results in a larger number of false-positive hits from the tree for a given query, which slows down predicate evaluation.The solution in most cases is to "explode" the multi-part geometries into their respective parts before adding to the tree, but this then means that you have to manage the overhead of that second set of geometries, and deduplicate the data after a spatial join - if your goal was simply to do a spatial join between the multi-part geometries and a set of source geometries.
I think we can optimize this a bit by always exploding multi-part geometries before insert to the tree, but retaining the original index and set of geometries. Then in the internals of querying the tree, we would deduplicate back to the original indexes of the tree geometries and return those.
We could make it an optional parameter on the tree to do this optimization.