Skip to content

Store source for nested objects#108818

Merged
kkrik-es merged 9 commits intoelastic:mainfrom
kkrik-es:fix/synthetic-source/nested-object
May 22, 2024
Merged

Store source for nested objects#108818
kkrik-es merged 9 commits intoelastic:mainfrom
kkrik-es:fix/synthetic-source/nested-object

Conversation

@kkrik-es
Copy link
Copy Markdown
Member

@kkrik-es kkrik-es commented May 20, 2024

Covers nested objects with fields or arrays of objects. Adds support for tracking the source of arrays in disabled objects, too.

Related to #106825, #108417

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @kkrik-es, I've created a changelog YAML for you.

@kkrik-es kkrik-es requested review from lkts and martijnvg May 20, 2024 12:42
@kkrik-es kkrik-es marked this pull request as ready for review May 20, 2024 12:42
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left two comments.

}

if (context.parent().isNested()) {
if (context.mappingLookup().isSourceSynthetic() && context.getClonedSource() == false) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about the added code here. Is the reason that this is needed in case there is an object field that is mapped as nested field? The array case is then covered by parseNonDynamicArray(...)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, just in case there's an entry not using an array. Should be rare but it can happen.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case is the || mapper instanceof NestedObjectMapper addition at line 612 required?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, that section covers the usual case where the nested object contains an array.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a nested object has already been added to ignored source, then I think we don't need to do anything for that json object contains an array? Since it has already been added to ignored source as part of the object?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we either get an array or an object, not an object wrapping an array. I verified this in the unit test, isn't that the valid syntax.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see without || mapper instanceof NestedObjectMapper we only store the first element of an array. So subsequent objects in an array would not get recorded.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add this is as a comment?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +615 to +622
boolean isRoot = context.parent() instanceof RootObjectMapper;
context.addIgnoredField(
new IgnoredSourceFieldMapper.NameValue(
isRoot ? arrayFieldName : context.parent().name() + "." + arrayFieldName,
isRoot ? 0 : context.parent().fullPath().length() + 1,
XContentDataHelper.encodeXContentBuilder(tuple.v2())
)
);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe turn this logic into a helper method? I think this appears now several times in this file?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, @lkts added one. Applied here.

token = context.parser().currentToken();
parser = context.parser();
}
context = context.createNestedContext((NestedObjectMapper) context.parent());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the DocumentParserContext constructor that is used eventually from createNestedContext(...) method also copy clonedSource field from provided context?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this needs to be propagated.

}

if (context.parent().isNested()) {
if (context.mappingLookup().isSourceSynthetic() && context.getClonedSource() == false) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a nested object has already been added to ignored source, then I think we don't need to do anything for that json object contains an array? Since it has already been added to ignored source as part of the object?


boolean booleanValue = randomBoolean();
int intValue = randomInt();
var syntheticSource = syntheticSource(documentMapper, b -> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also assert the Lucene document that get created here? I think it can be useful to assert the actual IndexableField instances for _ignored_source field. Maybe the syntheticSource(...) method can optionally accept a Consumer<LuceneDocument> parameter, that allows to the test to assert the field instances?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm not sure what would be the benefit, the contents are encoded. The decoded version should show up in the printed source that we already assert on.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, even if for some reason we end up with duplicate ignored stored fields. (duplicate array elements would be printed)

Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@kkrik-es kkrik-es merged commit 7e09df7 into elastic:main May 22, 2024
@kkrik-es kkrik-es deleted the fix/synthetic-source/nested-object branch May 23, 2024 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants