Extract centroid from doc values for ST_CENTROID_AGG over geo_shape and cartesian_shape#142528
Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Hi @craigtaverner, I've created a changelog YAML for you. |
So a single query that has both ST_EXTENT_AGG and ST_CENTROID_AGG will benefit from docs-values
3e71867 to
aff89c2
Compare
|
Hi @craigtaverner, I've updated the changelog YAML for you. |
|
|
||
| private final FieldExtractPreference withDocValues; | ||
| private final FieldExtractPreference withSpatialBounds; | ||
| private final FieldExtractPreference withSpatialCentroid; |
There was a problem hiding this comment.
Could you do this using BlockLoaderFunctionConfig instead?
There was a problem hiding this comment.
The FieldExtractPreference is used by the planner to make requests to the block-loader, while it the BlockLoaderFunctionConfig seems to support fusing functions into block-loaders which seems, at least to me, to not be quite the same thing. I noticed that the FieldExtractPreference is being passed into the supportsLoaderConfig, so it appears the two concepts are both supposed to exist and interact to some extent, but are not the same thing.
ncordon
left a comment
There was a problem hiding this comment.
I've left some questions and comments
...in/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/PhysicalPlanOptimizerTests.java
Show resolved
Hide resolved
...in/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/PhysicalPlanOptimizerTests.java
Show resolved
Hide resolved
...gin/spatial/src/main/java/org/elasticsearch/xpack/spatial/index/mapper/ShapeFieldMapper.java
Outdated
Show resolved
Hide resolved
.../esql/src/test/java/org/elasticsearch/xpack/esql/planner/TestPhysicalOperationProviders.java
Outdated
Show resolved
Hide resolved
...main/java/org/elasticsearch/xpack/spatial/index/mapper/GeoShapeWithDocValuesFieldMapper.java
Show resolved
Hide resolved
| case EvalExec evalExec -> { | ||
| centroidAttributes.removeAll(evalExec.references()); | ||
| boundsAttributes.removeAll(evalExec.references()); | ||
| } | ||
| case FilterExec filterExec -> { | ||
| centroidAttributes.removeAll(filterExec.condition().references()); | ||
| boundsAttributes.removeAll(filterExec.condition().references()); | ||
| } |
There was a problem hiding this comment.
Could we have a default case UnaryExec instead of FilterExec and EvalExec here? If the shape appears in any non aggregation we should remove the reference.
That seems a bit more future proof in case we added extra ESQL commands that should disallow loading from doc values in the future?
There was a problem hiding this comment.
The code path differences between these two make it a little tricky. I think I'd like to consider that in followup work. Let's think of counter-examples, and then act on that. What plan node could contain a spatial function that is not covered by EvalExec and FilterExec?
There was a problem hiding this comment.
When I reviewed the pr I tried to come up with such example and I couldn't. And neither could Cursor (which was throwing me false negatives), but you know the language much better than I do.
There was a problem hiding this comment.
I'm not able to come up with other commands, but am wondering about non-spatial aggregations that support spatial types, like VALUES. I will investigate a little further if there are edge cases there, and fix them if there are, otherwise this PR is ready to merge.
...c/internalClusterTest/java/org/elasticsearch/xpack/esql/spatial/SpatialPushDownTestCase.java
Show resolved
Hide resolved
...arch/compute/aggregation/spatial/SpatialExtentCartesianShapeCombinedDocValuesAggregator.java
Outdated
Show resolved
Hide resolved
...arch/compute/aggregation/spatial/SpatialExtentCartesianShapeCombinedDocValuesAggregator.java
Outdated
Show resolved
Hide resolved
...arch/compute/aggregation/spatial/SpatialExtentCartesianShapeCombinedDocValuesAggregator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Optimizes ES|QL ST_CENTROID_AGG over geo_shape and cartesian_shape by extracting centroid information directly from shape doc-values (and supporting a combined bounds+centroid extraction path so ST_EXTENT_AGG and ST_CENTROID_AGG can share a single doc-values load).
Changes:
- Add shape BlockLoaders to extract centroid-only and combined bounds+centroid data from doc-values.
- Add an ES|QL physical optimizer rule to plan centroid/extent doc-values extraction (including combined extraction when both aggs use the same field).
- Introduce new compute aggregators/suppliers for centroid/extent when reading the combined bounds+centroid doc-values format, plus expand tests/specs accordingly.
Reviewed changes
Copilot reviewed 22 out of 34 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| x-pack/plugin/spatial/src/main/java/org/elasticsearch/xpack/spatial/index/mapper/ShapeFieldMapper.java | Adds cartesian shape centroid and combined bounds+centroid BlockLoaders. |
| x-pack/plugin/spatial/src/main/java/org/elasticsearch/xpack/spatial/index/mapper/GeoShapeWithDocValuesFieldMapper.java | Adds geo shape centroid and combined bounds+centroid BlockLoaders. |
| x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/planner/TestPhysicalOperationProviders.java | Updates test block building/copying to account for centroid/combined extraction using double blocks. |
| x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/PhysicalPlanOptimizerTests.java | Adds optimizer tests for centroid extraction and combined bounds+centroid extraction on shapes. |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/PlannerUtils.java | Maps shape element types to DOUBLE for centroid/combined extraction preferences. |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/FieldExtractExec.java | Tracks centroid-extractable shape attributes and returns combined extract preference when both bounds+centroid are needed. |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/physical/local/SpatialShapeDocValuesExtraction.java | New optimizer rule that enables centroid/extent doc-values extraction for shape aggs (including combined mode). |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/physical/local/SpatialShapeBoundsExtraction.java | Removes the old bounds-only shape extraction rule (superseded by combined rule). |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizer.java | Registers the new combined shape doc-values extraction rule. |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/SpatialExtent.java | Adds suppliers for combined bounds+centroid doc-values input for shape extents. |
| x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/SpatialCentroid.java | Adds suppliers for centroid-only and combined bounds+centroid doc-values input for shape centroids. |
| x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/spatial/SpatialPushDownTestCase.java | Centralizes centroid-matching helpers for pushdown tests. |
| x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/spatial/SpatialPushDownShapeTestCase.java | Adds integration tests validating centroid results for shapes with/without doc-values across shards. |
| x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/spatial/SpatialPushDownPointsTestCase.java | Removes duplicated centroid-matching helpers in favor of base test case methods. |
| x-pack/plugin/esql/qa/testFixtures/src/main/resources/spatial_shapes.csv-spec | Updates/extends spec coverage for centroid + combined centroid/extent outputs and quantized cartesian expectations. |
| x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/spatial/SpatialExtentGeoShapeCombinedDocValuesAggregator.java | New aggregator that reads bounds from the combined bounds+centroid double format for geo shapes. |
| x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/spatial/SpatialExtentCartesianShapeCombinedDocValuesAggregator.java | New aggregator that reads bounds from the combined bounds+centroid double format for cartesian shapes. |
| x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/spatial/SpatialCentroidShapeDocValuesAggregator.java | New aggregator that reads centroid-only [x,y,weight,type] double format for shapes. |
| x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/spatial/SpatialCentroidShapeCombinedDocValuesAggregator.java | New aggregator that reads centroid from the tail of the combined bounds+centroid double format. |
| x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialExtentGeoShapeCombinedDocValuesGroupingAggregatorFunction.java | Generated grouping aggregator wiring for geo combined extent. |
| x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialExtentGeoShapeCombinedDocValuesAggregatorFunctionSupplier.java | Generated supplier for geo combined extent. |
| x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialExtentGeoShapeCombinedDocValuesAggregatorFunction.java | Generated non-grouping aggregator wiring for geo combined extent. |
| x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialExtentCartesianShapeCombinedDocValuesGroupingAggregatorFunction.java | Generated grouping aggregator wiring for cartesian combined extent. |
| x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialExtentCartesianShapeCombinedDocValuesAggregatorFunctionSupplier.java | Generated supplier for cartesian combined extent. |
| x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialExtentCartesianShapeCombinedDocValuesAggregatorFunction.java | Generated non-grouping aggregator wiring for cartesian combined extent. |
| x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialCentroidShapeDocValuesGroupingAggregatorFunction.java | Generated grouping aggregator wiring for centroid-only shape centroid. |
| x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialCentroidShapeDocValuesAggregatorFunctionSupplier.java | Generated supplier for centroid-only shape centroid. |
| x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialCentroidShapeDocValuesAggregatorFunction.java | Generated non-grouping aggregator wiring for centroid-only shape centroid. |
| x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialCentroidShapeCombinedDocValuesGroupingAggregatorFunction.java | Generated grouping aggregator wiring for combined-format shape centroid. |
| x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialCentroidShapeCombinedDocValuesAggregatorFunctionSupplier.java | Generated supplier for combined-format shape centroid. |
| x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialCentroidShapeCombinedDocValuesAggregatorFunction.java | Generated non-grouping aggregator wiring for combined-format shape centroid. |
| server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java | Adds extraction preferences for centroid and combined bounds+centroid and helper transitions between preferences. |
| server/src/main/java/org/elasticsearch/index/mapper/AbstractShapeGeometryFieldMapper.java | Implements centroid and combined bounds+centroid doc-values BlockLoaders for shape fields. |
| docs/changelog/142528.yaml | Adds changelog entry for the ES |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...in/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/PhysicalPlanOptimizerTests.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java
Outdated
Show resolved
Hide resolved
...gin/spatial/src/main/java/org/elasticsearch/xpack/spatial/index/mapper/ShapeFieldMapper.java
Outdated
Show resolved
Hide resolved
...arch/compute/aggregation/spatial/SpatialExtentCartesianShapeCombinedDocValuesAggregator.java
Show resolved
Hide resolved
.../esql/src/test/java/org/elasticsearch/xpack/esql/planner/TestPhysicalOperationProviders.java
Outdated
Show resolved
Hide resolved
And added a test for mixed point/shape doc-values pushdown
Cursor got maxY/minY mixed up.
| /** | ||
| * Support for ST_CENTROID_AGG aggregation on geo_shape and cartesian_shape fields. | ||
| */ | ||
| ST_CENTROID_AGG_SHAPES, |
There was a problem hiding this comment.
Shouldn't we always add a new capability last? We are also altering the number for ST_CENTROID_AGG_SHAPES here
There was a problem hiding this comment.
Actually we should never add the new capability to the end of the enum. If we do add to the end, that maximizes the chances of PR conflicts with PRs written by developers that forgot that we should never add to the end of the list. We should instead add to a part of the list close to similar features, so they are easier to see together, and if there is a conflict it is most likely with a PR the same developer wrote as part of similar work.
There was a problem hiding this comment.
The enum ordinal is not used for anything, so it does not matter that we change the order. The enum name is the only relevant part.
…nd cartesian_shape (elastic#142528) The the PR at elastic#141657 we added support for ST_CENTROID_AGG over geo_shape and cartesian_shape. However, this loads from source, which is inefficient since these centroids are actually saved in the doc-values representations. This PR switches to load from doc-values. One thing to keep in mind is that we already load bounds from doc-values for ST_EXTENT_AGG, and it is completely reasonable for the same query to do both ST_EXTENT_AGG and ST_CENTROID_AGG, in which case we want to load both centroid and bounds at the same time. The solution expands on the existing code for loading bounds into a multi-value like block. For bounds we were loading either 4 or 6 integers (for cartesian or geo extents). Now we have the following combinations: * Geo-Bounds as 6 integers: top, bottom, negLeft, negRight, posLeft, posRight * Cartesian-Bounds as 4 integers: min-x, max-x, max-y, min-y * Centroid as 4 doubles: x, y, weight, shape-type * Geo-Bounds and centroid as 10 doubles: top, bottom, negLeft, negRight, posLeft, posRight, x, y, weight, shape-type * Cartesian-Bounds and centroid as 8 doubles: min-x, max-x, max-y, min-y, x, y, weight, shape-type So we moved from 2 combinations to 6, and since centroid weights were stored in doc-values as double, while centroid coordinately and extents were stored as quantized ints, we also moved the conversion from int to double to the block-loader rather than the aggregator. Since the doc-values header does not include more useful data, it is unlikely we will expand from this combination to anything more complex in future. Fixes elastic#142640
The the PR at #141657 we added support for ST_CENTROID_AGG over geo_shape and cartesian_shape. However, this loads from source, which is inefficient since these centroids are actually saved in the doc-values representations. This PR switches to load from doc-values.
One thing to keep in mind is that we already load bounds from doc-values for ST_EXTENT_AGG, and it is completely reasonable for the same query to do both ST_EXTENT_AGG and ST_CENTROID_AGG, in which case we want to load both centroid and bounds at the same time.
The solution expands on the existing code for loading bounds into a multi-value like block. For bounds we were loading either 4 or 6 integers (for cartesian or geo extents). Now we have the following combinations:
So we moved from 2 combinations to 6, and since centroid weights were stored in doc-values as double, while centroid coordinately and extents were stored as quantized ints, we also moved the conversion from int to double to the block-loader rather than the aggregator.
Since the doc-values header does not include more useful data, it is unlikely we will expand from this combination to anything more complex in future.
Fixes #142640