Skip to content

Normalize malformed aggregate_double_metric#90815

Merged
elasticsearchmachine merged 5 commits intoelastic:mainfrom
nik9000:aggregate_metric_double_ignores
Nov 7, 2022
Merged

Normalize malformed aggregate_double_metric#90815
elasticsearchmachine merged 5 commits intoelastic:mainfrom
nik9000:aggregate_metric_double_ignores

Conversation

@nik9000
Copy link
Copy Markdown
Member

@nik9000 nik9000 commented Oct 11, 2022

This makes ignore_malformed for aggregate_double_metric save just the top level field name into the meta _ignored field, like every other field. It does so by pulling part of the parsing logic back from it's delegate numeric fields. Now it uses the delegates to parse the value and to save the values, but not to handle malformed values. Instead, it handles malformed values itself on the top level.

Doing this made it fairly simple to solve two other issues:

  • ignore_malformed had to remove non-malformed values that had previously been added to the document. Now we only save the values after all have been parsed.
  • When we validate some of the values we use to have to dig them out of the document using inefficient code that was written mostly for testing. Now that we're saving the values before adding them to the document it's easy enough to validate from that list instead which amounts to a single array access.

Closes #90572

@nik9000 nik9000 added >non-issue :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data v8.6.0 labels Oct 11, 2022
@nik9000 nik9000 requested a review from csoulios October 11, 2022 18:43
e.getCause().getMessage(),
containsString("failed to parse field [field.value_count] of type [integer] in document with id '1'.")
);
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tested as part of the exampleMalformedValues.

private static Number value(XContentParser parser, NumberType numberType, Number nullValue, boolean coerce)
throws IllegalArgumentException, IOException {

public Number value(XContentParser parser) throws IllegalArgumentException, IOException {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'll need javadoc if we want to keep it. It's sort of leaky, but I've done worse. It's for a good cause.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is leaky indeed, but I subscribe to your good cause. Not sure if there's a better way to parse and index the value. If the es-search is ok with it, I am fine too

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @romseygeek should have a look here too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep fine by me.

@nik9000 nik9000 requested a review from romseygeek October 11, 2022 18:44
This makes `ignore_malformed` for `aggregate_double_metric` save just
the top level field name into the meta `_ignored` field, like every
other field. It does so by pulling part of the parsing logic back from
it's delegate numeric fields. Now it uses the delegates to parse the
value and to save the values, but not to handle malformed values.
Instead, it handles malformed values itself on the top level.

Doing this made it fairly simple to solve two other issues:
* `ignore_malformed` had to remove non-malformed values that had
  previously been added to the document. Now we only save the values
  after all have been parsed.
* When we validate some of the values we use to have to dig them out
  of the document using inefficient code that was written mostly for
  testing. Now that we're saving the values before adding them to the
  document it's easy enough to validate from that list instead which
  amounts to a single array access.

Closes elastic#90572
@nik9000 nik9000 marked this pull request as ready for review October 12, 2022 17:50
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 12, 2022
Copy link
Copy Markdown
Contributor

@csoulios csoulios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the big dilemma here is if we want to expose the parse/index methods of the NumberFieldMapper class. I don't see why not to.

I have left some small comments.

someMalformedValues = true;
}
}
MalformedHandler onMalformed = new MalformedHandler();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand where exactly is this MalformedHandler used.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh..... Its been a while since I wrote this and I can't find it. Maybe it's a leftover of another attempt. I'll get back to you.

}

// Check if all required metrics have been parsed.
for (Metric m : metrics) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you only checked the sizes of metrics and metricsParsed. You would not have to iterate over metrics?
You think there can be a case that this fails?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// Check that there aren't any duplicates already parsed
for (Metric m : metricsParsed.keySet()) {
NumberFieldMapper delegateFieldMapper = metricFieldMappers.get(m);
if (context.doc().getByKey(delegateFieldMapper.fieldType().name()) != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method Document.onlyAddKey() was supposed to check for duplicate fields in the doc. I think this method is called from NumferFieldMapper.indexValue(). In this case the duplicates check is performed twice.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I suppose I don't need to check this. It's just passing the right exception message back. I think don't need to check this because a duplicate will fail with the first key every time - because we don't index unless all keys are present. That's what I did above.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method deserves another visit because I see us constantly doing a test like this one to get a better error message. I'd like to smooth it a bit in a follow up if that's ok
.

private static Number value(XContentParser parser, NumberType numberType, Number nullValue, boolean coerce)
throws IllegalArgumentException, IOException {

public Number value(XContentParser parser) throws IllegalArgumentException, IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is leaky indeed, but I subscribe to your good cause. Not sure if there's a better way to parse and index the value. If the es-search is ok with it, I am fine too

Copy link
Copy Markdown
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, happy to have another look when @csoulios 's feedback has been addressed.

return;
}
// Rethrow exception as is. It is going to be caught and nested in a MapperParsingException
// by its FieldMapper.MappedFieldType#parse()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FieldMapper#parse I think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still applies

private static Number value(XContentParser parser, NumberType numberType, Number nullValue, boolean coerce)
throws IllegalArgumentException, IOException {

public Number value(XContentParser parser) throws IllegalArgumentException, IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep fine by me.

@nik9000 nik9000 requested a review from csoulios November 3, 2022 16:49
@nik9000 nik9000 requested review from csoulios and romseygeek and removed request for csoulios November 4, 2022 14:30
Copy link
Copy Markdown
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit in a comment, LGTM otherwise

return;
}
// Rethrow exception as is. It is going to be caught and nested in a MapperParsingException
// by its FieldMapper.MappedFieldType#parse()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still applies

Copy link
Copy Markdown
Contributor

@csoulios csoulios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@nik9000 nik9000 added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Nov 7, 2022
@elasticsearchmachine elasticsearchmachine merged commit d29d2e8 into elastic:main Nov 7, 2022
@nik9000 nik9000 deleted the aggregate_metric_double_ignores branch November 7, 2022 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >non-issue :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.6.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

aggregate_metric_double stores it's subfields in _ignore

4 participants