Skip to content

Extract centroid from doc values for ST_CENTROID_AGG over geo_shape and cartesian_shape#142528

Merged
elasticsearchmachine merged 30 commits intoelastic:mainfrom
craigtaverner:extract_centroid_doc_values
Mar 17, 2026
Merged

Extract centroid from doc values for ST_CENTROID_AGG over geo_shape and cartesian_shape#142528
elasticsearchmachine merged 30 commits intoelastic:mainfrom
craigtaverner:extract_centroid_doc_values

Conversation

@craigtaverner
Copy link
Copy Markdown
Contributor

@craigtaverner craigtaverner commented Feb 16, 2026

The the PR at #141657 we added support for ST_CENTROID_AGG over geo_shape and cartesian_shape. However, this loads from source, which is inefficient since these centroids are actually saved in the doc-values representations. This PR switches to load from doc-values.

One thing to keep in mind is that we already load bounds from doc-values for ST_EXTENT_AGG, and it is completely reasonable for the same query to do both ST_EXTENT_AGG and ST_CENTROID_AGG, in which case we want to load both centroid and bounds at the same time.

The solution expands on the existing code for loading bounds into a multi-value like block. For bounds we were loading either 4 or 6 integers (for cartesian or geo extents). Now we have the following combinations:

  • Geo-Bounds as 6 integers: top, bottom, negLeft, negRight, posLeft, posRight
  • Cartesian-Bounds as 4 integers: min-x, max-x, max-y, min-y
  • Centroid as 4 doubles: x, y, weight, shape-type
  • Geo-Bounds and centroid as 10 doubles: top, bottom, negLeft, negRight, posLeft, posRight, x, y, weight, shape-type
  • Cartesian-Bounds and centroid as 8 doubles: min-x, max-x, max-y, min-y, x, y, weight, shape-type

So we moved from 2 combinations to 6, and since centroid weights were stored in doc-values as double, while centroid coordinately and extents were stored as quantized ints, we also moved the conversion from int to double to the block-loader rather than the aggregator.

Since the doc-values header does not include more useful data, it is unlikely we will expand from this combination to anything more complex in future.

Fixes #142640

@craigtaverner craigtaverner added >enhancement :Analytics/Geo Indexing, search aggregations of geo points and shapes Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Feb 16, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @craigtaverner, I've created a changelog YAML for you.

@craigtaverner craigtaverner force-pushed the extract_centroid_doc_values branch from 3e71867 to aff89c2 Compare February 16, 2026 09:46
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @craigtaverner, I've updated the changelog YAML for you.


private final FieldExtractPreference withDocValues;
private final FieldExtractPreference withSpatialBounds;
private final FieldExtractPreference withSpatialCentroid;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do this using BlockLoaderFunctionConfig instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FieldExtractPreference is used by the planner to make requests to the block-loader, while it the BlockLoaderFunctionConfig seems to support fusing functions into block-loaders which seems, at least to me, to not be quite the same thing. I noticed that the FieldExtractPreference is being passed into the supportsLoaderConfig, so it appears the two concepts are both supposed to exist and interact to some extent, but are not the same thing.

Copy link
Copy Markdown
Member

@ncordon ncordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some questions and comments

Comment on lines +119 to +126
case EvalExec evalExec -> {
centroidAttributes.removeAll(evalExec.references());
boundsAttributes.removeAll(evalExec.references());
}
case FilterExec filterExec -> {
centroidAttributes.removeAll(filterExec.condition().references());
boundsAttributes.removeAll(filterExec.condition().references());
}
Copy link
Copy Markdown
Member

@ncordon ncordon Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a default case UnaryExec instead of FilterExec and EvalExec here? If the shape appears in any non aggregation we should remove the reference.

That seems a bit more future proof in case we added extra ESQL commands that should disallow loading from doc values in the future?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code path differences between these two make it a little tricky. I think I'd like to consider that in followup work. Let's think of counter-examples, and then act on that. What plan node could contain a spatial function that is not covered by EvalExec and FilterExec?

Copy link
Copy Markdown
Member

@ncordon ncordon Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I reviewed the pr I tried to come up with such example and I couldn't. And neither could Cursor (which was throwing me false negatives), but you know the language much better than I do.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not able to come up with other commands, but am wondering about non-spatial aggregations that support spatial types, like VALUES. I will investigate a little further if there are edge cases there, and fix them if there are, otherwise this PR is ready to merge.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes ES|QL ST_CENTROID_AGG over geo_shape and cartesian_shape by extracting centroid information directly from shape doc-values (and supporting a combined bounds+centroid extraction path so ST_EXTENT_AGG and ST_CENTROID_AGG can share a single doc-values load).

Changes:

  • Add shape BlockLoaders to extract centroid-only and combined bounds+centroid data from doc-values.
  • Add an ES|QL physical optimizer rule to plan centroid/extent doc-values extraction (including combined extraction when both aggs use the same field).
  • Introduce new compute aggregators/suppliers for centroid/extent when reading the combined bounds+centroid doc-values format, plus expand tests/specs accordingly.

Reviewed changes

Copilot reviewed 22 out of 34 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
x-pack/plugin/spatial/src/main/java/org/elasticsearch/xpack/spatial/index/mapper/ShapeFieldMapper.java Adds cartesian shape centroid and combined bounds+centroid BlockLoaders.
x-pack/plugin/spatial/src/main/java/org/elasticsearch/xpack/spatial/index/mapper/GeoShapeWithDocValuesFieldMapper.java Adds geo shape centroid and combined bounds+centroid BlockLoaders.
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/planner/TestPhysicalOperationProviders.java Updates test block building/copying to account for centroid/combined extraction using double blocks.
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/PhysicalPlanOptimizerTests.java Adds optimizer tests for centroid extraction and combined bounds+centroid extraction on shapes.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/PlannerUtils.java Maps shape element types to DOUBLE for centroid/combined extraction preferences.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/FieldExtractExec.java Tracks centroid-extractable shape attributes and returns combined extract preference when both bounds+centroid are needed.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/physical/local/SpatialShapeDocValuesExtraction.java New optimizer rule that enables centroid/extent doc-values extraction for shape aggs (including combined mode).
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/physical/local/SpatialShapeBoundsExtraction.java Removes the old bounds-only shape extraction rule (superseded by combined rule).
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizer.java Registers the new combined shape doc-values extraction rule.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/SpatialExtent.java Adds suppliers for combined bounds+centroid doc-values input for shape extents.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/SpatialCentroid.java Adds suppliers for centroid-only and combined bounds+centroid doc-values input for shape centroids.
x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/spatial/SpatialPushDownTestCase.java Centralizes centroid-matching helpers for pushdown tests.
x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/spatial/SpatialPushDownShapeTestCase.java Adds integration tests validating centroid results for shapes with/without doc-values across shards.
x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/spatial/SpatialPushDownPointsTestCase.java Removes duplicated centroid-matching helpers in favor of base test case methods.
x-pack/plugin/esql/qa/testFixtures/src/main/resources/spatial_shapes.csv-spec Updates/extends spec coverage for centroid + combined centroid/extent outputs and quantized cartesian expectations.
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/spatial/SpatialExtentGeoShapeCombinedDocValuesAggregator.java New aggregator that reads bounds from the combined bounds+centroid double format for geo shapes.
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/spatial/SpatialExtentCartesianShapeCombinedDocValuesAggregator.java New aggregator that reads bounds from the combined bounds+centroid double format for cartesian shapes.
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/spatial/SpatialCentroidShapeDocValuesAggregator.java New aggregator that reads centroid-only [x,y,weight,type] double format for shapes.
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/spatial/SpatialCentroidShapeCombinedDocValuesAggregator.java New aggregator that reads centroid from the tail of the combined bounds+centroid double format.
x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialExtentGeoShapeCombinedDocValuesGroupingAggregatorFunction.java Generated grouping aggregator wiring for geo combined extent.
x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialExtentGeoShapeCombinedDocValuesAggregatorFunctionSupplier.java Generated supplier for geo combined extent.
x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialExtentGeoShapeCombinedDocValuesAggregatorFunction.java Generated non-grouping aggregator wiring for geo combined extent.
x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialExtentCartesianShapeCombinedDocValuesGroupingAggregatorFunction.java Generated grouping aggregator wiring for cartesian combined extent.
x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialExtentCartesianShapeCombinedDocValuesAggregatorFunctionSupplier.java Generated supplier for cartesian combined extent.
x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialExtentCartesianShapeCombinedDocValuesAggregatorFunction.java Generated non-grouping aggregator wiring for cartesian combined extent.
x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialCentroidShapeDocValuesGroupingAggregatorFunction.java Generated grouping aggregator wiring for centroid-only shape centroid.
x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialCentroidShapeDocValuesAggregatorFunctionSupplier.java Generated supplier for centroid-only shape centroid.
x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialCentroidShapeDocValuesAggregatorFunction.java Generated non-grouping aggregator wiring for centroid-only shape centroid.
x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialCentroidShapeCombinedDocValuesGroupingAggregatorFunction.java Generated grouping aggregator wiring for combined-format shape centroid.
x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialCentroidShapeCombinedDocValuesAggregatorFunctionSupplier.java Generated supplier for combined-format shape centroid.
x-pack/plugin/esql/compute/src/main/generated/org/elasticsearch/compute/aggregation/spatial/SpatialCentroidShapeCombinedDocValuesAggregatorFunction.java Generated non-grouping aggregator wiring for combined-format shape centroid.
server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java Adds extraction preferences for centroid and combined bounds+centroid and helper transitions between preferences.
server/src/main/java/org/elasticsearch/index/mapper/AbstractShapeGeometryFieldMapper.java Implements centroid and combined bounds+centroid doc-values BlockLoaders for shape fields.
docs/changelog/142528.yaml Adds changelog entry for the ES

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +446 to +449
/**
* Support for ST_CENTROID_AGG aggregation on geo_shape and cartesian_shape fields.
*/
ST_CENTROID_AGG_SHAPES,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we always add a new capability last? We are also altering the number for ST_CENTROID_AGG_SHAPES here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we should never add the new capability to the end of the enum. If we do add to the end, that maximizes the chances of PR conflicts with PRs written by developers that forgot that we should never add to the end of the list. We should instead add to a part of the list close to similar features, so they are easier to see together, and if there is a conflict it is most likely with a PR the same developer wrote as part of similar work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum ordinal is not used for anything, so it does not matter that we change the order. The enum name is the only relevant part.

@craigtaverner craigtaverner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Mar 17, 2026
@elasticsearchmachine elasticsearchmachine merged commit 2997cac into elastic:main Mar 17, 2026
36 checks passed
@craigtaverner craigtaverner deleted the extract_centroid_doc_values branch March 17, 2026 12:56
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
…nd cartesian_shape (elastic#142528)

The the PR at elastic#141657 we
added support for ST_CENTROID_AGG over geo_shape and cartesian_shape.
However, this loads from source, which is inefficient since these
centroids are actually saved in the doc-values representations. This PR
switches to load from doc-values.

One thing to keep in mind is that we already load bounds from doc-values
for ST_EXTENT_AGG, and it is completely reasonable for the same query to
do both ST_EXTENT_AGG and ST_CENTROID_AGG, in which case we want to load
both centroid and bounds at the same time.

The solution expands on the existing code for loading bounds into a
multi-value like block. For bounds we were loading either 4 or 6
integers (for cartesian or geo extents). Now we have the following
combinations: * Geo-Bounds as 6 integers: top, bottom, negLeft,
negRight, posLeft, posRight * Cartesian-Bounds as 4 integers: min-x,
max-x, max-y, min-y * Centroid as 4 doubles: x, y, weight, shape-type *
Geo-Bounds and centroid as 10 doubles: top, bottom, negLeft, negRight,
posLeft, posRight, x, y, weight, shape-type * Cartesian-Bounds and
centroid as 8 doubles: min-x, max-x, max-y, min-y, x, y, weight,
shape-type

So we moved from 2 combinations to 6, and since centroid weights were
stored in doc-values as double, while centroid coordinately and extents
were stored as quantized ints, we also moved the conversion from int to
double to the block-loader rather than the aggregator.

Since the doc-values header does not include more useful data, it is
unlikely we will expand from this combination to anything more complex
in future.

Fixes elastic#142640
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL :Analytics/Geo Indexing, search aggregations of geo points and shapes auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize ST_CENTROID_AGG over shapes using doc-values

5 participants