Centroid aggregation for cartesian points and shapes#89216
Centroid aggregation for cartesian points and shapes#89216craigtaverner merged 10 commits intoelastic:mainfrom
Conversation
fb6a81e to
4f8269b
Compare
.../java/org/elasticsearch/xpack/spatial/search/aggregations/support/ShapeValuesSourceType.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/xpack/spatial/search/aggregations/support/ShapeValuesProvider.java
Outdated
Show resolved
Hide resolved
3fc7c78 to
f0d1ec0
Compare
dd3dcd0 to
d954cdd
Compare
There was a problem hiding this comment.
I would really like that CartesianShapeValuesSource returns a CartesianShapeValues. There are methods in ShapeValues that are geo specific and should only be present in GeoShapeValues. There is something that feels not right at the moment.
There was a problem hiding this comment.
The previous PR was supposed to make sure that ShapeValues had no geo-specific code, but I can already see that I left at least CoordinateEncoder.Geo there, so I'll review this and fix it. I suspect there will be more cases where the previous PR did not quite complete the generalization of the code.
There was a problem hiding this comment.
I notice also the use of LatLonGeometry and LatLonGeometryRelationVisitor. However, despite their names, those two classes are not geo-specific and look like they work perfectly well for cartesian data too. Perhaps they should be renamed?
There was a problem hiding this comment.
The biggest problem is not in ShapeValues but ShapeValue. For example the method # relate(LatLonGeometry latLonGeometry) is for geo.
I think GeoshapeValue and Cartesianshapevalue are meant to support different things and should not be represented with the same ShapeValue.
There was a problem hiding this comment.
But that is what I meant above. The method relate(LatLonGeometry) does not have anything geo-specific in it. All that code, and the code it calls is specific to the Lucene Component2D mechanism and is identical between geo and cartesian. The only mistake here is that the classes are incorrectly named LatLon..., while they have nothing to do with geo, and are used for both geo and cartesian data. They all work directly on the Geometry and Component2D interfaces, both of which are used for both geo and cartesian data.
So I think I can fix this by renaming the classes prefixed with LatLon something else. Then it will be more clear.
There was a problem hiding this comment.
I have still not fixed the incorrectly named LatLon* classes that have nothing to do with Geo, and are used by both geo and cartesian. That could be done anytime. It is tech debt that precedes this work.
There was a problem hiding this comment.
I am getting confused, I think you are not working on top of the latest main, still exposing ShapeValues on geo / cartesian classes
There was a problem hiding this comment.
Yes, indeed. But I've fixed that now. Reverted the commit that moved back to GeoShapeValues.
There was a problem hiding this comment.
That commit was based on the assumption that ShapeValues has geo-specific code in it. And that is still not true. I remember when I approved that before I said that I wanted to re-evaluate my assumptions and reconsider that work once we did the Cartesian PR. Now this is the Cartesian PR and it looks to me like the generalization is in fact a very good idea. We make a whole stack of code common between Cartesian and Geo. And this is all enabled by the Lucene Component2D interface which is what actually makes this stack common.
There was a problem hiding this comment.
I think you assumptions are wrong, please don't expose ShapeValues as being the same for geo and cartesian, they are not.
There was a problem hiding this comment.
I see this is not used anywhere, maybe we should remove it until we decide to have runtime support
There was a problem hiding this comment.
Agreed. I also remember we discussed with the painless team the option of building a completely different fields API for script for this, so none of the prototype code here is of value. I created an issue to discuss that at #90182
There was a problem hiding this comment.
This come from an unused class, let's remove everything? we can revisit later.
There was a problem hiding this comment.
Agreed. I also remember we discussed with the painless team the option of building a completely different fields API for script for this, so none of the prototype code here is of value. I created an issue to discuss that at #90182
There was a problem hiding this comment.
I believe I did that in the same commit that removed the script classes.
There was a problem hiding this comment.
There is something that is not right here because ShapeValues.ShapeValue contains geo specific code. For example this means that CartesianShapeValue supports calling #relate(LatLonGeometry) which is not right. I think we ought to move the geo specific methods out of ShapeValue and only leave the generic ones?
There was a problem hiding this comment.
There is no geo-specific code in ShapeValues or in ShapeValues.ShapeValue. I double checked and the only mistake is in class naming. For some reason classes are named LatLon* when they have no geo-specific behaviour. I commented on this above, and also pushed a new commit that further reduces the code duplication in ShapeValues.
If you can find any geo-specific method in ShapeValues or in ShapeValues.ShapeValue, please let me know, because I cannot find them. It was made generic in the previous PR, there is nothing geo-specific left.
3fa5603 to
dd17a50
Compare
678c347 to
9c07b41
Compare
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
iverase
left a comment
There was a problem hiding this comment.
Sorry, you just reverted the changes we did before. I don't want ShapeValue to be used indistinctly between geo and cartesian, they are different things.
There was a problem hiding this comment.
I disagree here, please do not remove the generic. GeoShapeValuesSource should return GeoShapeValues.
|
Hi @craigtaverner, I've created a changelog YAML for you. |
06bb685 to
8496596
Compare
There was a problem hiding this comment.
Maybe we can rename this file to 10_centroid so it can contain geo and cartesian implementations?
There was a problem hiding this comment.
Agree, this method sees weird here, I am ok to address it in a follow up.
eb84fcd to
a081ded
Compare
Earlier work generalized GeoShapeValues so that all geo-specific code could be injected, and there was no need for objects of type GeoShapeValues. However, this advantage was not taken, and instances of GeoShapeValues, and now also CartesianShapeValues, were being created with different behaviors leading to a lot of duplicate code. This unnecessary duplication is now entirely removed for the three common cases: * Empty ShapeValues * ShapeValues using BinaryDocValues (in ShapeValueSourceType classes) * ShapeValues wrapping other ShapeValues and replacing missing values Bring back GeoShapeValues We found a way to bring back GeoShapeValues (and the new CartesianShapeValues) minimizing changes to the Geo-code while still achieving the removal of duplicated code.
However, this is largely a duplicate of geo-centroid docs since they are essentially identical behaviour. We should consider merging them.
This can be re-introduced in another PR
Work on isAggregatable caused a minor logic conflict. When that work was done, Point and Shape were not aggregatable, but now they are.
1e6dc72 to
bc74445
Compare
Based on the groundwork to Refactor GeoPoint and GeoShape with generics this PR provides support for centroid aggregations over
Cartesianpointandshapefields.Closes #90156