Make Geo Context Parsing More Strict#32412
Conversation
Currently, if geo context is represented by something other than geo_point or an object with lat and lon fields, the parsing of it as a geo context can result in ignoring the context altogether, returning confusing errors such as number_format_exception or trying to parse the number specifying as long-encoded hash code. It would also fail if the geo_point was stored. This commit makes the parsing more strict and will fail the indexing if the context is not specified correctly. It also adds support for stored geo_points. Closes elastic#32202
|
Pinging @elastic/es-search-aggs |
| } else { | ||
| // Does this object exist? If it does and we didn't find what we need - we need to warn user | ||
| for (IndexableField field : document.getFields()) { | ||
| if (field.name().startsWith(fieldName + ".")) { |
There was a problem hiding this comment.
I know the "dots in field names" discussion has been a long running one. Do we not yet have a more general/graceful way of throwing these exceptions? This is more of a question out of my own curiosity and not intended to hold up the PR.
There was a problem hiding this comment.
I am not sure we should do that, can we restrict the geo context to geo fields and validate this assumption from the mapping (context.mapperService) directly ? If the field has the correct type in the mapping we don't have to check why it's missing.
There was a problem hiding this comment.
We can definitely remove that and clean this up even more, but this will be much more breaking change, since today you can have this sudo geo object and it works, if we will remove this in 7, then working indices with this, created in 6 are going to break. I am not sure we can do that. We will probably have to disable it for indices in 7 but I don't think we can fully remove it until 8.
There was a problem hiding this comment.
I'd treat this case like the geohash string version, this is not documented nor tested so we should remove it. The documentation is clear, you can use a path that references a geo_point field or set the point directly in the context.
| geohashes.add(stringEncode(spare.getLon(), spare.getLat(), precision)); | ||
| } | ||
| } | ||
| } else { |
There was a problem hiding this comment.
the code above seems useless in 7 (even in 6.x ?) since we have a LatLonPoint and a LatLonDocValuesField ?
There was a problem hiding this comment.
The way I understood the code above is if you want to represent lat and lon as an object, but don't want to index it as an actual geopoint you could do that.
| } else { | ||
| // Does this object exist? If it does and we didn't find what we need - we need to warn user | ||
| for (IndexableField field : document.getFields()) { | ||
| if (field.name().startsWith(fieldName + ".")) { |
There was a problem hiding this comment.
I am not sure we should do that, can we restrict the geo context to geo fields and validate this assumption from the mapping (context.mapperService) directly ? If the field has the correct type in the mapping we don't have to check why it's missing.
|
@jimczi I pushed some changes. Not really sure if I understood your idea correctly. But it might be easier to collaborate on the code. Could you take another look? |
|
Discussed this a bit more with @nik9000. It looks like there might be a way to move this check into mapping phase instead of indexing phase. I will give it a try. |
|
I think I figured it out, but since it's complete rewrite with a completely new approach and no shared history I am going to open another PR. Closing this one for now. |
Currently, if geo context is represented by something other than geo_point or an object with lat and lon fields, the parsing of it as a geo context can result in ignoring the context altogether, returning confusing errors such as number_format_exception or trying to parse the number specifying as long-encoded hash code. It would also fail if the geo_point was stored. This commit makes the mapping parsing more strict and will fail during mapping update or index creation if the geo context doesn't point to a geo_point field. Supersedes #32412 Closes #32202
Currently, if geo context is represented by something other than
geo_point or an object with lat and lon fields, the parsing of it
as a geo context can result in ignoring the context altogether,
returning confusing errors such as number_format_exception or trying
to parse the number specifying as long-encoded hash code. It would also
fail if the geo_point was stored.
This commit makes the parsing more strict and will fail the indexing
if the context is not specified correctly. It also adds support for
stored geo_points and removes support for long geo_hashes in numeric
and string types since it doesn't seem to be intentional.
In case of objects the new change will accept objects that contain
lat and lon fields:
It will reject objects that contain any other fields without lat and
lon:
And it will ignore empty objects if other contexts for this suggestion are
provided:
Closes #32202