Skip to content

Synthetic _source: ignore_malformed in geo_point#90777

Closed
nik9000 wants to merge 3 commits intoelastic:mainfrom
nik9000:synthetic_source_geo_point_ignore_malformed
Closed

Synthetic _source: ignore_malformed in geo_point#90777
nik9000 wants to merge 3 commits intoelastic:mainfrom
nik9000:synthetic_source_geo_point_ignore_malformed

Conversation

@nik9000
Copy link
Copy Markdown
Member

@nik9000 nik9000 commented Oct 10, 2022

This adds support for geo_point fields configured with ignore_malformed for synthetic _source. geo_point fields are the first that support object or array valued malformed fields. To support that we copy complex fields while we read them.

@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Oct 10, 2022

I'm not super happy with how this turned out. It feels very shaky. Very wobbly. It works because we use XContentSubParser with it. Which, maybe, means it should be a feature of that. Or something. I don't know exactly.

This adds support for `geo_point` fields configured with
`ignore_malformed` for synthetic `_source`. `geo_point` fields are the
first that support object or array valued malformed fields. To support
*that* we copy complex fields while we read them.
throw new MapperParsingException("failed to parse field [" + fieldType().name() + "] of type [" + contentType() + "]", e);
}
});
}, ignoreMalformed() && context.isSyntheticSource());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than passing this boolean all the way down, can we not wrap the context parser here? I think that would make this feel less fragile.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complexity here is because of arrays. For arrays, we will ignore only the ones that are malformed but keep the ones that are correctly parsed. So we need to generate one stored field value for each malformed element of the array.

Unfortunately array parsing is tricky because we accept values like [0,0] so array parsing is done on the point parser.

That means as well we cannot wrap the parser here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I tried wrapping here. It doesn't work well for the reasons @iverase mentioned. Partly because of the tricky formats we support and partly because we want to do ignore_malformed for each individual point.

I suppose the thing that I like the least is this whole "wrap the parser to support objects and it's always an object" thing. It just feels super leaky and weird. OTOH, what we're doing is tricky so maybe it has to leak to some degree.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of saveSyntheticSourceForMalformed is always going to be the same for a given mapper, right? So I think my worries would be assuaged a bit if we added it as a parameter to the PointParser constructor rather than passing the same boolean value through half-a-dozen intervening functions for each document.

ignoreMalformed on PointParser is a bit weird anyway. The Cartesian implementation uses it to ignore validation, but the GeoPoint version uses it to attempt normalisation (but then throws an error anyway if the values can't be normalised).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.