Skip to content

Centroid aggregation for cartesian points and shapes#89216

Merged
craigtaverner merged 10 commits intoelastic:mainfrom
craigtaverner:more_cartesian2
Sep 28, 2022
Merged

Centroid aggregation for cartesian points and shapes#89216
craigtaverner merged 10 commits intoelastic:mainfrom
craigtaverner:more_cartesian2

Conversation

@craigtaverner
Copy link
Copy Markdown
Contributor

@craigtaverner craigtaverner commented Aug 9, 2022

Based on the groundwork to Refactor GeoPoint and GeoShape with generics this PR provides support for centroid aggregations over Cartesian point and shape fields.

Closes #90156

@craigtaverner craigtaverner added :Analytics/Geo Indexing, search aggregations of geo points and shapes v8.5.0 labels Aug 9, 2022
@craigtaverner craigtaverner force-pushed the more_cartesian2 branch 6 times, most recently from 3fc7c78 to f0d1ec0 Compare August 17, 2022 14:18
@craigtaverner craigtaverner force-pushed the more_cartesian2 branch 3 times, most recently from dd3dcd0 to d954cdd Compare September 7, 2022 09:58
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really like that CartesianShapeValuesSource returns a CartesianShapeValues. There are methods in ShapeValues that are geo specific and should only be present in GeoShapeValues. There is something that feels not right at the moment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous PR was supposed to make sure that ShapeValues had no geo-specific code, but I can already see that I left at least CoordinateEncoder.Geo there, so I'll review this and fix it. I suspect there will be more cases where the previous PR did not quite complete the generalization of the code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice also the use of LatLonGeometry and LatLonGeometryRelationVisitor. However, despite their names, those two classes are not geo-specific and look like they work perfectly well for cartesian data too. Perhaps they should be renamed?

Copy link
Copy Markdown
Contributor

@iverase iverase Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest problem is not in ShapeValues but ShapeValue. For example the method # relate(LatLonGeometry latLonGeometry) is for geo.

I think GeoshapeValue and Cartesianshapevalue are meant to support different things and should not be represented with the same ShapeValue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that is what I meant above. The method relate(LatLonGeometry) does not have anything geo-specific in it. All that code, and the code it calls is specific to the Lucene Component2D mechanism and is identical between geo and cartesian. The only mistake here is that the classes are incorrectly named LatLon..., while they have nothing to do with geo, and are used for both geo and cartesian data. They all work directly on the Geometry and Component2D interfaces, both of which are used for both geo and cartesian data.

So I think I can fix this by renaming the classes prefixed with LatLon something else. Then it will be more clear.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have still not fixed the incorrectly named LatLon* classes that have nothing to do with Geo, and are used by both geo and cartesian. That could be done anytime. It is tech debt that precedes this work.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am getting confused, I think you are not working on top of the latest main, still exposing ShapeValues on geo / cartesian classes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed. But I've fixed that now. Reverted the commit that moved back to GeoShapeValues.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That commit was based on the assumption that ShapeValues has geo-specific code in it. And that is still not true. I remember when I approved that before I said that I wanted to re-evaluate my assumptions and reconsider that work once we did the Cartesian PR. Now this is the Cartesian PR and it looks to me like the generalization is in fact a very good idea. We make a whole stack of code common between Cartesian and Geo. And this is all enabled by the Lucene Component2D interface which is what actually makes this stack common.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you assumptions are wrong, please don't expose ShapeValues as being the same for geo and cartesian, they are not.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is not used anywhere, maybe we should remove it until we decide to have runtime support

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I also remember we discussed with the painless team the option of building a completely different fields API for script for this, so none of the prototype code here is of value. I created an issue to discuss that at #90182

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3fa5603

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This come from an unused class, let's remove everything? we can revisit later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I also remember we discussed with the painless team the option of building a completely different fields API for script for this, so none of the prototype code here is of value. I created an issue to discuss that at #90182

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3fa5603

@craigtaverner craigtaverner changed the title More cartesian point and shape support, especially aggs (take 2) Centroid aggregation for cartesian points and shapes Sep 20, 2022
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I did that in the same commit that removed the script classes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something that is not right here because ShapeValues.ShapeValue contains geo specific code. For example this means that CartesianShapeValue supports calling #relate(LatLonGeometry) which is not right. I think we ought to move the geo specific methods out of ShapeValue and only leave the generic ones?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no geo-specific code in ShapeValues or in ShapeValues.ShapeValue. I double checked and the only mistake is in class naming. For some reason classes are named LatLon* when they have no geo-specific behaviour. I commented on this above, and also pushed a new commit that further reduces the code duplication in ShapeValues.

If you can find any geo-specific method in ShapeValues or in ShapeValues.ShapeValue, please let me know, because I cannot find them. It was made generic in the previous PR, there is nothing geo-specific left.

@csoulios csoulios added v8.6.0 and removed v8.5.0 labels Sep 21, 2022
@craigtaverner craigtaverner marked this pull request as ready for review September 22, 2022 10:06
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 22, 2022
Copy link
Copy Markdown
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, you just reverted the changes we did before. I don't want ShapeValue to be used indistinctly between geo and cartesian, they are different things.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree here, please do not remove the generic. GeoShapeValuesSource should return GeoShapeValues.

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @craigtaverner, I've created a changelog YAML for you.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can rename this file to 10_centroid so it can contain geo and cartesian implementations?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to 20_centroid.yml in 2710f73

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this method sees weird here, I am ok to address it in a follow up.

@craigtaverner craigtaverner force-pushed the more_cartesian2 branch 3 times, most recently from eb84fcd to a081ded Compare September 27, 2022 15:13
Earlier work generalized GeoShapeValues so that all geo-specific code could
be injected, and there was no need for objects of type GeoShapeValues.
However, this advantage was not taken, and instances of GeoShapeValues, and
now also CartesianShapeValues, were being created with different behaviors
leading to a lot of duplicate code. This unnecessary duplication is now entirely
removed for the three common cases:

* Empty ShapeValues
* ShapeValues using BinaryDocValues (in ShapeValueSourceType classes)
* ShapeValues wrapping other ShapeValues and replacing missing values

Bring back GeoShapeValues

We found a way to bring back GeoShapeValues (and the new CartesianShapeValues)
minimizing changes to the Geo-code while still achieving the removal of duplicated code.
However, this is largely a duplicate of geo-centroid docs since they are
essentially identical behaviour. We should consider merging them.
This can be re-introduced in another PR
Work on isAggregatable caused a minor logic conflict.
When that work was done, Point and Shape were not aggregatable,
but now they are.
Copy link
Copy Markdown
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement release highlight Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.6.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Centroid aggregation for Cartesian points and shapes

4 participants