Skip to content

PROPOSAL: optimize STRtree for geometry collections / multipart geometries #112

@brendan-ward

Description

@brendan-ward

I think the way the internals of STRtree are implemented now, the tree is based on the bounding box of each geometry, which may be a collection of multiple parts. This means that there is sometimes a wide disparity between the bounding box of that geometry and bounding boxes of its individual parts. This results in a larger number of false-positive hits from the tree for a given query, which slows down predicate evaluation.

The solution in most cases is to "explode" the multi-part geometries into their respective parts before adding to the tree, but this then means that you have to manage the overhead of that second set of geometries, and deduplicate the data after a spatial join - if your goal was simply to do a spatial join between the multi-part geometries and a set of source geometries.

I think we can optimize this a bit by always exploding multi-part geometries before insert to the tree, but retaining the original index and set of geometries. Then in the internals of querying the tree, we would deduplicate back to the original indexes of the tree geometries and return those.

We could make it an optional parameter on the tree to do this optimization.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions