Skip to content

Track source for arrays of objects#108417

Merged
kkrik-es merged 30 commits intoelastic:mainfrom
kkrik-es:fix/synthetic-source/array
May 17, 2024
Merged

Track source for arrays of objects#108417
kkrik-es merged 30 commits intoelastic:mainfrom
kkrik-es:fix/synthetic-source/array

Conversation

@kkrik-es
Copy link
Copy Markdown
Member

@kkrik-es kkrik-es commented May 8, 2024

This PR uses option store_array_source for objects to track the source of (sub)object arrays in synthetic mode. This allows preserving the original source, as synthetic mode merges, deduplicates and sorts array entries by default. The downside is that the whole object array (including children objects and fields) gets stored twice, while there's additional overhead while synthesizing source at query time.

This functionality will be experimental initially, to get some experience before documenting and opening up its use.

Fixes #90708

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @kkrik-es, I've created a changelog YAML for you.

XContentParserConfiguration configuration,
XContentBuilder builder
) throws IOException {
DocumentParserContext subcontext = context.switchParser(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new usage of the switchParser() method. However it is currently deprecated. I think if we like to use then we should un-deprecate this method. I think this new usage is a valid use case. cc: @javanna

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the concerns we had with switchParser, i think your assessment is right. Also, its javadocs says

we are actively deprecating and removing the ability to pass complex objects to multifields, so try and avoid using this method

Given that this was committed 4 years ago, the "actively" part no longer holds true. I reviewed the current usages of switchParser and they seem ok to me. I have a slight concern that we are now using the method in a general utility method, that may spread its usage instead of limiting it to the only places where it's strictly needed. I am not sure what to do about that though, and I am ok with the current plan. Thanks for raising this.

@kkrik-es kkrik-es marked this pull request as ready for review May 14, 2024 12:36
@kkrik-es kkrik-es requested a review from lkts May 14, 2024 13:51
Copy link
Copy Markdown
Contributor

@lkts lkts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add documentation for new parameter?

hasValue = false;
if (ignoredValues != null) {
for (IgnoredSourceFieldMapper.NameValue ignored : ignoredValues) {
b.field(ignored.getFieldName());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think about this before - doesn't this break the alphabetic sorting of fields in the source? We say in documentation that it is sorted.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. I'll work on a follow-up PR to restore ordering.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I gave it a try here. While it's easy to order fields at the object level, we don't do so for the contents of the object array that get stored and retrieved verbatim. I added tests for this, not sure if there's an easy way to fix this..

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's exactly what we need, we only specify ordering of fields.

@kkrik-es
Copy link
Copy Markdown
Member Author

Should we add documentation for new parameter?

I was thinking of keeping it experimental initially, until Observability provides feedback on its use. @felixbarny fyi.

Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more questions and comments.

Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@kkrik-es kkrik-es merged commit 99e8c27 into elastic:main May 17, 2024
@kkrik-es kkrik-es deleted the fix/synthetic-source/array branch May 17, 2024 11:55
@felixbarny
Copy link
Copy Markdown
Member

The store_array_source flag looks good to me, I've pinged the team to test it and provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Synthetic _source: preserve source for specific fields (object arrays)

6 participants