Skip to content

Support synthetic source for aggregate_metric_double when ignore_malformed is used#108746

Merged
lkts merged 12 commits intoelastic:mainfrom
lkts:feature/ignore_malformed_synthetic_source
May 23, 2024
Merged

Support synthetic source for aggregate_metric_double when ignore_malformed is used#108746
lkts merged 12 commits intoelastic:mainfrom
lkts:feature/ignore_malformed_synthetic_source

Conversation

@lkts
Copy link
Copy Markdown
Contributor

@lkts lkts commented May 16, 2024

This PR adds synthetic source support for aggregate_metric_double field when ignore_malformed is used.

This PR introduces a pattern that will be reused in ignore_malformed support in synthetic source for other (complex object-like) fields. The pattern is to create a "shadow" XContentBuilder that replicates all successfully parsed fields and values. In case of malformed data, everything remaining in the parser (inside the field) is copied over to the builder. As a result we get both successfully parsed pieces, malformed piece, and skipped pieces which is a full representation of user input and can go to synthetic source.

Contributes to #90007.

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @lkts, I've created a changelog YAML for you.

* Build a {@link BytesRef} wrapping a byte array containing an encoded form
* of the passed XContentBuilder contents.
*/
public static BytesRef encodeXContentBuilder(XContentBuilder builder) throws IOException {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from #108417.

@lkts lkts requested review from kkrik-es and martijnvg May 16, 2024 23:09
b.field(IGNORE_MALFORMED, true);
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some more elaborate use cases, here or in yaml? We want to check for:

  • Nested subfields within aggregate_metric_double
  • Partial aggregate_metric_double (e.g. one metric ok, one invalid, two ok)
  • Nested malformed fields, with other fields at the same level

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do cover partials in malformedValue. Nested is a good suggestion.

@lkts lkts changed the title Support synthetic source for aggregate_metric_double when ignore_malf… Support synthetic source for aggregate_metric_double when ignore_malformed is used May 17, 2024
@lkts
Copy link
Copy Markdown
Contributor Author

lkts commented May 21, 2024

@elasticmachine update branch

metric:
min: 18.2
max: 100
value_count: 50
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: move value_count to the end, to make sure it's still tracked.

Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I left one question for verifying my understanding.

// We don't use DrainingXContentParser since we don't want to go beyond current field
malformedContentForSyntheticSource.copyCurrentStructure(context.parser());
}
;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unneeded semicolon?

Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants