improve aggregate_metric_double parsing time by weizijun · Pull Request #80685 · elastic/elasticsearch

weizijun · 2021-11-13T14:20:07Z

when there are many aggregate_metric_double type fields, like hundreds, writing index is slow. I perf record the flame graph. and find that, It's about 68% time in LuceneDocument.getField.

Why LuceneDocument.getField cost so many time? Because in the aggregate_metric_double parse logic. It will check if the field is single, it will call LuceneDocument.getField to check multi values.

and the LuceneDocument.getField method is a heavy method, when there are many fields. Because the method use List to find the key, it will foreach the list. So If the fields grow, the LuceneDocument.getField is slow.

I add a flag in NumberFieldMapper, that will add the field to the LuceneDocument.keyedFields, it's a map, so check multi values is O(1) cost, and become cheap.

here is a part of the flame graph:

<g>
<title>org/elasticsearch/index/mapper/LuceneDocument.getField (5,963 samples, 67.53%)</title><rect x="284.1" y="1251.0" width="796.9" height="15" fill="#53e453" rx="2" ry="2"/>
<text x="287.1" y="1262.0">org/elasticsearch/index/mapper/LuceneDocument.getField</text>
</g>

weizijun · 2021-11-13T14:21:02Z

hi, @nik9000 @csoulios, can you help to review this PR?

csoulios

@weizijun thank you for submitting this improvement.

Indeed, AggregateDoubleMetricFieldMapper naively uses LuceneDocument#getField that iterates over all fields in the document and should definitely leverage the LuceneDocument#keyedFields map to check for duplicates.

However, NumberFieldMapper#indexValue is going to change significantly so that it can support _tsid generation (#80276)

See here: https://github.com/elastic/elasticsearch/pull/80276/files#diff-690f39e25a1fddcfe00433a60cb25365138254e25315726ef7fd195230a3e774

So, I suggest you don't use the LuceneDocument#keyedFields implicitly via the NumberFieldMapper, but access it directly via LuceneDocument#addWithKey and LuceneDocument#getByKey in the AggregateDoubleMetricFieldMapper#parseCreateField method.

In this case you would not have to modify NumberFieldMapper at all.

elasticmachine · 2021-11-15T13:23:49Z

Pinging @elastic/es-search (Team:Search)

elasticmachine · 2021-11-15T13:23:49Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

weizijun · 2021-11-15T15:08:24Z

@weizijun thank you for submitting this improvement.

Indeed, AggregateDoubleMetricFieldMapper naively uses LuceneDocument#getField that iterates over all fields in the document and should definitely leverage the LuceneDocument#keyedFields map to check for duplicates.

However, NumberFieldMapper#indexValue is going to change significantly so that it can support _tsid generation (#80276)

See here: https://github.com/elastic/elasticsearch/pull/80276/files#diff-690f39e25a1fddcfe00433a60cb25365138254e25315726ef7fd195230a3e774

So, I suggest you don't use the LuceneDocument#keyedFields implicitly via the NumberFieldMapper, but access it directly via LuceneDocument#addWithKey and LuceneDocument#getByKey in the AggregateDoubleMetricFieldMapper#parseCreateField method.

In this case you would not have to modify NumberFieldMapper at all.

@csoulios I see your change, it affect the dimension field, but normal aggregate_metric_double field don't affect

csoulios · 2021-11-15T15:12:45Z

@csoulios I see your change, it affect the dimension field, but normal aggregate_metric_double field don't affect

You are right. Also NumberFieldMapper does not use the LuceneDocument#keyedFields at all any more.

* upstream/master: (150 commits) Fix ComposableIndexTemplate equals when composed_of is null (elastic#80864) Optimize DLS bitset building for matchAll query (elastic#81030) URL option for BaseRunAsSuperuserCommand (elastic#81025) Less Verbose Serialization of Snapshot Failure in SLM Metadata (elastic#80942) Fix shadowed vars pt7 (elastic#80996) Fail shards early when we can detect a type missmatch (elastic#79869) Delegate Ref Counting to ByteBuf in Netty Transport (elastic#81096) Clarify `unassigned.reason` docs (elastic#81017) Strip blocks from settings for reindex targets (elastic#80887) Split off the values supplier for ScriptDocValues (elastic#80635) [ML] Switch message and detail for model snapshot deprecations (elastic#81108) [DOCS] Update xrefs for snapshot restore docs (elastic#81023) [ML] Updates visiblity of validate API (elastic#81061) Track histogram of transport handling times (elastic#80581) [ML] Fix datafeed preview with remote indices (elastic#81099) [ML] Fix acceptable model snapshot versions in ML deprecation checker (elastic#81060) [ML] Add logging for failing PyTorch test (elastic#81044) Extending the timeout waiting for snapshot to be ready (elastic#81018) [ML] Fix incorrect logging of unexpected model size error (elastic#81089) [ML] Make inference timeout test more reliable (elastic#81094) ... # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java

csoulios

Thank you for adding this. I am happy to merge this change.

Can you please change the parameter name in the NumberFieldMapper

csoulios · 2021-12-03T11:35:18Z

server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java

            this.script.precludesParameters(ignoreMalformed, coerce, nullValue);
            addScriptValidation(script, indexed, hasDocValues);
+
+            this.singleValue = Parameter.boolParam("single_value", false, m -> toType(m).singleValue, false);


Maybe you should change the parameter name here to allow_multiple_values:true|false, as discussed in #80825, #58523 and https://github.com/elastic/ecs/blob/main/rfcs/text/0029-enforce-single-value-fields.md

I just saw there is an open PR for adding allow_multiple_values parameter (#80289)

@markharwood do you think we should wait for your PR to be merged?

yeah, this parameter is useful, I can wait for #80289 merged, then change the aggregate_metric_double logic

My PR changes the main DocumentParser parsing logic to reject documents with arrays where allowMultipleValues is false for a field. Currently only user-defined mappings set this flag.
The PR does not tackle changes to existing single-valued fields like AggregateDoubleMetricFieldMapper that have implemented their own singularity checks (which I understand here have performance issues).

This is perhaps an argument for committing #80289 and creating a follow-up one for fields like AggregateDoubleMetricFieldMapper to strip out their singularity checks and override the new FieldMapper. allowMultipleValues() to return false. This would mean for always-singular fields like this, the central DocumentParser class would quickly reject any arrays at source. One complication that may need addressing is that the existing Lucene document-level singularity-checking code probably handles scenarios where the double field is singular but is contained in an object that might be repeated in an array e.g.[ { "num": 1}, {"num":2} ] rather than "num":[1,2]. The DocumentParser may need extra logic to spot the former.
Thoughts @jpountz ?

yeah, It's maybe not equals to single value. your PR reject the array value, but if a field in a object array, like your example: [ { "num": 1}, {"num":2} ], it may be duplicate in lucene. the AggregateDoubleMetricFieldMapper need a single value in lucene, not only array case. e.g:

{ "field": [{ "min": 10.0, "max": 50.0, "value_count": 3 }, { "min": 11.0, "max": 51.0, "value_count": 3 }] }

it's not allow like this.

Right, so the DocumentParser would keep tracks of when it enters and exits arrays while parsing. Any field flagged as allow_multiple_values:false would cause an error if the current parse context was inside an array (no matter the nesting depth).

yeah, when your PR merged, I will change the AggregateDoubleMetricFieldMapper code to adapt the parameter.

Perhaps you can change the mapping parameter from single_value to allow_multiple_values with the following semantics:

Should the field permit arrays of values? Accepts true
(default) or false. When false, documents presenting arrays
of values for this field will be rejected with an error.

So that we are consistent with what the Mark's design

okay, I will change it.

csoulios · 2021-12-03T11:54:39Z

@elasticmachine, test this please

weizijun · 2021-12-03T21:15:03Z

@elasticmachine update branch

* upstream/main: Add 8.5 migration docs (elastic#88923) Script: Reindex & UpdateByQuery Metadata (elastic#88665) Remove unused plugins dir var from server CLI (elastic#88917) Use tracing API in TaskManager (elastic#88885) Add source fallback for keyword fields using operation (elastic#88735) Prune changelogs after 8.3.3 release Bump versions after 8.3.3 release Add a test for checking for misspelled "dry_run" parameters for Desired Nodes API (elastic#88898) Speedup BalanceUnbalancedClusterTests (elastic#88794) Preventing exceptions on node shutdown in integration tests (elastic#88827) Do not trigger check part3 for test mute and docs PRs (elastic#88895) Add troubleshooting docs about data corruption (elastic#88760) Mute RollupActionSingleNodeTests#testRollupDatastream (elastic#88891) [DOCS] Domain splitting impacts API keys (elastic#88677) Fix SqlSearchIT testAllTypesWithRequestToOldNodes (elastic#88866) (elastic#88883) Update synthetic-source.asciidoc (elastic#88880) Log more details in TaskAssertions (elastic#88864) Make Tuple a record (elastic#88280)

weizijun · 2022-07-29T07:27:17Z

@csoulios all tests have passed, and update to the latest code. Can you help to review it?

csoulios

Generally, the change looks ok to me. It definitely improves performance of adding a new aggregate_metric_double field.

I have added some minor comments for improvements.

Also, I would like to get an opinion from the @elastic/es-search team since they are the experts on the field mapper area.

csoulios · 2022-07-29T08:33:17Z

server/src/test/java/org/elasticsearch/index/mapper/WholeNumberFieldMapperTests.java

        );
    }

+    public void testSingleValuedField() throws IOException {


Is there any reason why this test is in WholeNumberFieldMapperTests and not in NumberFieldMapperTests?

csoulios · 2022-07-29T08:37:21Z

...in/java/org/elasticsearch/xpack/aggregatemetric/mapper/AggregateDoubleMetricFieldMapper.java

+                    if (e.getCause() instanceof IllegalArgumentException) {
+                        throw (IllegalArgumentException) e.getCause();
+                    } else {
+                        throw e;
+                    }


I guess that you extractIllegalArgumentException from this MapperParsingException because you want to bubble up the underlying error instead of throwing a generic message like this `"failed to parse field [field.value_count] of type [integer] in document with id '1'.".

If that's the case, can you please add a comment here to document the thinking behind this?

csoulios · 2022-07-29T08:51:18Z

...va/org/elasticsearch/xpack/aggregatemetric/mapper/AggregateDoubleMetricFieldMapperTests.java

            containsString(
-                "Field [field] of type [aggregate_metric_double] "
-                    + "does not support indexing multiple values for the same field in the same document"
+                "failed to parse field [field.value_count] of type [integer] in document with id '1'. Preview of field's value: '3'"


Here we receive a very generic error message. I think it's worth throwing the more informative: Field [field] of type [aggregate_metric_double] does not support indexing multiple values for the same field in the same documet.

Do you think that's possible?

javanna · 2022-08-04T11:12:48Z

I would wait to get this in until we have reached some conclusion on #80825 . I would not like to get in this flag for one of our field mappers without deciding first what we are going to do overall.

romseygeek · 2022-08-04T11:44:49Z

Would it be possible to add the functionality for aggregate_metric_double without exposing it as a public flag? We know that we never want this field to be multi-valued, so we can happily add the internal APIs to handle it but delay making it available on other field types until we've decided on the public API

csoulios · 2022-08-04T12:25:26Z

I agree with @romseygeek.

We can keep the NumberFieldMapper#allowMultipleValues as an internal parameter that sets this behavior, but not expose it as a mapping parameter. Therefore, the AggregateDoubleMetricFieldType can set it internally.

Later, we are free to change the implementation and perhaps introduce a new mapping parameter without breaking backwards compatibility.

weizijun · 2022-08-04T15:18:16Z

Yeah, I also agree with @romseygeek , I have talked with @csoulios , I will modify the code.

* upstream/main: (265 commits) Disable openid connect tests due to missing fixture (elastic#89478) Add periodic job for single processor node testing Updates to changelog processing after docs redesign (elastic#89463) Better support for multi cluster for run task (elastic#89442) Mute failing tests (elastic#89465) [ML] Performance improvements related to ECS Grok pattern usage (elastic#89424) Add source fallback support for date and date_nanos mapped types (elastic#89440) Reuse Info in lifecycle step (elastic#89419) feature: support metrics for multi value fields (elastic#88818) Upgrade OpenTelemetry API and remove workaround (elastic#89438) Remove LegacyClusterTaskResultActionListener (elastic#89459) Add YAML spec docs about matching errors (elastic#89370) Remove redundant cluster upgrade tests for auth tokens (elastic#89417) Return 400 error for GetUserPrivileges call with API keys (elastic#89333) User Profile - Detailed errors in hasPrivileges response (elastic#89224) Rollover min_* conditions docs and highlight (elastic#89434) REST tests for percentiles_bucket agg (elastic#88029) REST tests for cumulative pipeline aggs (elastic#88966) Clean-up file watcher keys. (elastic#89429) fix a typo in Security.java (elastic#89248) ... # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java

weizijun · 2022-08-30T06:35:47Z

@elasticmachine update branch

csoulios

LGTM!

csoulios · 2022-09-07T16:16:01Z

@elasticmachine update branch

csoulios · 2022-09-07T16:20:18Z

I have discussed this PR with @javanna offline and we have agreed that this implementation does not expose any public APIs (mapping parameters) for setting a field as single-valued.

So, we can move forward with merging this PR.

csoulios · 2022-09-07T18:21:05Z

@elasticmachine run elasticsearch-ci/bwc

weizijun · 2022-09-08T06:01:03Z

@elasticmachine update branch

csoulios

LGTM!

Thank you for submitting this improvement

csoulios · 2022-09-08T08:57:22Z

server/src/test/java/org/elasticsearch/index/mapper/NumberFieldMapperTests.java

+            fail("mapper [" + mapper.getClass() + "] error, not number field");
+        }
+
+        Exception e = expectThrows(


weizijun added 2 commits November 13, 2021 15:27

temp

5d309be

improve cache IllegalArgumentException exception

5dd8ed9

elasticsearchmachine added v8.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Nov 13, 2021

csoulios reviewed Nov 15, 2021

View reviewed changes

csoulios added :StorageEngine/TSDB You know, for Metrics :Search Foundations/Mapping Index mappings, including merging and defining field types labels Nov 15, 2021

elasticmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team labels Nov 15, 2021

weizijun added 3 commits November 30, 2021 15:06

fixup

da793eb

fix tests

171148a

csoulios reviewed Dec 3, 2021

View reviewed changes

Merge branch 'master' into number-support-single-type

7ce040b

weizijun mentioned this pull request Dec 6, 2021

New field mapping flag - allow_multiple_values #80289

Closed

markharwood mentioned this pull request Dec 6, 2021

[Meta] Better handling of single-valued fields #80825

Open

8 tasks

nik9000 assigned csoulios Jan 26, 2022

mark-vieira added v8.2.0 and removed v8.1.0 labels Feb 2, 2022

salvatore-campagna added v8.3.0 and removed v8.2.0 labels Mar 30, 2022

craigtaverner added v8.4.0 and removed v8.3.0 labels May 25, 2022

elasticmachine and others added 3 commits July 28, 2022 12:11

Merge branch 'main' into number-support-single-type

1f3782b

fixup

4d9b01a

csoulios suggested changes Jul 29, 2022

View reviewed changes

weizijun added 3 commits August 3, 2022 17:19

improve

a6575bf

improve

a6100a3

improve

68a8d4a

weizijun added 4 commits August 10, 2022 16:38

remove allow_multiple_values parameter

4eaf674

spotless

dddd3e1

checkstyle

829e5ad

Merge branch 'main' into number-support-single-type

2615f3f

csoulios approved these changes Sep 7, 2022

View reviewed changes

Merge branch 'main' into number-support-single-type

f4b0795

add test back

b93654e

elasticmachine and others added 2 commits September 8, 2022 15:31

Merge branch 'main' into number-support-single-type

256f149

add test back

fca97d4

csoulios approved these changes Sep 8, 2022

View reviewed changes

csoulios merged commit 1f1ac13 into elastic:main Sep 8, 2022

Conversation

weizijun commented Nov 13, 2021

Uh oh!

weizijun commented Nov 13, 2021

Uh oh!

csoulios left a comment

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Nov 15, 2021

Uh oh!

elasticmachine commented Nov 15, 2021

Uh oh!

weizijun commented Nov 15, 2021

Uh oh!

csoulios commented Nov 15, 2021

Uh oh!

csoulios left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

markharwood Dec 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

csoulios commented Dec 3, 2021

Uh oh!

weizijun commented Dec 3, 2021

Uh oh!

weizijun commented Jul 29, 2022

Uh oh!

csoulios left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

javanna commented Aug 4, 2022

Uh oh!

romseygeek commented Aug 4, 2022

Uh oh!

csoulios commented Aug 4, 2022

Uh oh!

weizijun commented Aug 4, 2022

Uh oh!

weizijun commented Aug 30, 2022

Uh oh!

csoulios left a comment

Choose a reason for hiding this comment

Uh oh!

csoulios commented Sep 7, 2022

Uh oh!

csoulios commented Sep 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csoulios commented Sep 7, 2022

Uh oh!

weizijun commented Sep 8, 2022

Uh oh!

csoulios left a comment

markharwood Dec 6, 2021 •

edited

Loading

csoulios left a comment •

edited

Loading

csoulios commented Sep 7, 2022 •

edited

Loading