Skip to content

Support synthetic source for geo_point when ignore_malformed is used#109651

Merged
lkts merged 9 commits intoelastic:mainfrom
lkts:feature/geo_point_synthetic_source_ignore_malformed
Jun 18, 2024
Merged

Support synthetic source for geo_point when ignore_malformed is used#109651
lkts merged 9 commits intoelastic:mainfrom
lkts:feature/geo_point_synthetic_source_ignore_malformed

Conversation

@lkts
Copy link
Copy Markdown
Contributor

@lkts lkts commented Jun 12, 2024

This PR draws inspiration from #90777 but takes a slightly different approach in order to reduce the amount of plumbing required.

We should be able to reuse MemorizingXContentParser for other field types that accept objects.

Contributes to #106483.

@lkts lkts added :StorageEngine/Logs You know, for Logs :StorageEngine/Mapping The storage related side of mappings labels Jun 12, 2024
@lkts lkts requested a review from a team as a code owner June 12, 2024 21:39
@github-actions
Copy link
Copy Markdown
Contributor

Documentation preview:

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@lkts
Copy link
Copy Markdown
Contributor Author

lkts commented Jun 13, 2024

@elasticmachine update branch

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @lkts, I've created a changelog YAML for you.

@lkts lkts removed the :StorageEngine/Logs You know, for Logs label Jun 13, 2024
return XContentDataHelper.storedField(name(name), parser);
}

public static StoredField storedField(String name, XContentBuilder builder) throws IOException {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: short method comment.

id: "1"

- match: { _source.geo_point.0.lon: -71.34000004269183 }
- match: { _source.geo_point.0.lat: 41.1199999647215 }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using close_to to avoid these long mismatching numbers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's neat, thanks.

import java.io.IOException;

/**
* A parser that parses a document preserving fields that were parsed so far in a {@link XContentBuilder}.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/preserving/copying/ to be more explicit.

I'd also note that this has performance overhead due to copying.

if (parser.currentToken() == XContentParser.Token.START_OBJECT
|| parser.currentToken() == XContentParser.Token.START_ARRAY) {
// We have a complex structure so we'll memorize it while parsing.
var memorizingParser = new MemorizingXContentParser(parser);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: assign directly to parserWithCustomization?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parserWithCustomization and memorizingParser are different types.

Copy link
Copy Markdown
Member

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring!

private void fetchFromSource(Object sourceMap, Consumer<T> consumer) {
try (XContentParser parser = wrapObject(sourceMap)) {
parse(parser, v -> consumer.accept(normalizeFromSource(v)), e -> {}); /* ignore malformed */
parse(parser, v -> consumer.accept(normalizeFromSource(v)), new NoopMalformedValueHandler()); /* ignore malformed */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for noop can we avoid a new instance every time? Maybe using a static instance?

- match: { _source.geo_point.0.lat: 41.1199999647215 }
- match: { _source.geo_point.1: "POINT (-77.03653 1000)" }


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test also the case where a geo point is empty? Or were only lat or lon are available? I would expect this to be reflected when reconstructing the document.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We test empty values in unit tests. I'll add a test for just one coordinate.

@lkts
Copy link
Copy Markdown
Contributor Author

lkts commented Jun 13, 2024

@elasticmachine update branch

Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lkts lkts merged commit 5440f17 into elastic:main Jun 18, 2024
@lkts lkts deleted the feature/geo_point_synthetic_source_ignore_malformed branch June 18, 2024 15:37
@felixbarny felixbarny mentioned this pull request Aug 6, 2024
50 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants