Save memory when histogram agg is not on top by nik9000 · Pull Request #57277 · elastic/elasticsearch

nik9000 · 2020-05-28T12:27:38Z

This saves some memory when the histogram aggregation is not a top
level aggregation by dropping asMultiBucketAggregator in favor of
natively implementing multi-bucket storage in the aggregator. For the
most part this just uses the LongKeyedBucketOrds that we built the
first time we did this.

This saves some memory when the `histogram` aggregation is not a top level aggregation by dropping `asMultiBucketAggregator` in favor of natively implementing multi-bucket storage in the aggregator. For the most part this just uses the `LongKeyedBucketOrds` that we built the first time we did this.

elasticmachine · 2020-05-28T12:27:40Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

nik9000 · 2020-05-28T12:27:58Z

I'm going to add a test for the new debug information.

nik9000 · 2020-05-28T13:03:44Z

...java/org/elasticsearch/search/aggregations/bucket/histogram/AbstractHistogramAggregator.java

+ * Base class for functionality shared between aggregators for this
+ * {@code histogram} aggregation.
+ */
+public abstract class AbstractHistogramAggregator extends BucketsAggregator {


There was a TODO that the range and numeric version of the aggregator shared a ton of code. Now they share a superclass that provides all that code.

nik9000 · 2020-05-28T13:04:11Z

.../java/org/elasticsearch/search/aggregations/bucket/histogram/HistogramAggregatorFactory.java

-                                        ValuesSource valuesSource, DocValueFormat formatter, SearchContext context,
-                                        Aggregator parent, Map<String, Object> metadata) throws IOException {
-                    ValuesSource.Range rangeValueSource = (ValuesSource.Range) valuesSource;
-                    if (rangeValueSource.rangeType().isNumeric() == false) {


I moved this check into the ctor so I can use the ctor reference that we've been doing elsewhere.

nik9000 · 2020-05-28T13:04:24Z

.../java/org/elasticsearch/search/aggregations/bucket/histogram/HistogramAggregatorFactory.java

                                            Aggregator parent,
                                            boolean collectsFromSingleBucket,
                                            Map<String, Object> metadata) throws IOException {
-        if (collectsFromSingleBucket == false) {


this is the important line!

nik9000 · 2020-05-28T13:05:18Z

.../org/elasticsearch/search/aggregations/bucket/histogram/NumericHistogramAggregatorTests.java

-            fieldType.setName("field");
            try (IndexReader reader = w.getReader()) {
                IndexSearcher searcher = new IndexSearcher(reader);
-                InternalHistogram histogram = search(searcher, new MatchAllDocsQuery(), aggBuilder, fieldType);


We do this kind of thing all over the place so I figured I'd make a utility method for it.

talevy · 2020-05-28T21:14:10Z

test/framework/src/main/java/org/elasticsearch/search/aggregations/AggregatorTestCase.java

        Releasables.close(releasables);
        releasables.clear();
    }
+


nice touch. likely usable across many future tests

nik9000 · 2020-05-29T13:31:04Z

I'm pulling some performance numbers for this. I'll likely merge before they get done and update with them once they come in. I'm fairly confident in it though.

nik9000 · 2020-05-29T16:51:26Z

About a 38% performance gain in the test that I ran:

Before:

|                    error rate |            index |           0 |      % |
|                Min Throughput | date_histo_histo |        0.28 |  ops/s |
|             Median Throughput | date_histo_histo |        0.28 |  ops/s |
|                Max Throughput | date_histo_histo |        0.28 |  ops/s |
|       50th percentile latency | date_histo_histo |     23599.5 |     ms |
|       90th percentile latency | date_histo_histo |     35095.1 |     ms |
|      100th percentile latency | date_histo_histo |     37977.2 |     ms |
|  50th percentile service time | date_histo_histo |     3575.41 |     ms |
|  90th percentile service time | date_histo_histo |     3605.77 |     ms |
| 100th percentile service time | date_histo_histo |     3659.23 |     ms |
|                    error rate | date_histo_histo |           0 |      % |

After:

|                Min Throughput | date_histo_histo |        0.33 |  ops/s |
|             Median Throughput | date_histo_histo |        0.34 |  ops/s |
|                Max Throughput | date_histo_histo |        0.34 |  ops/s |
|       50th percentile latency | date_histo_histo |     2200.88 |     ms |
|       90th percentile latency | date_histo_histo |     2222.86 |     ms |
|      100th percentile latency | date_histo_histo |     2245.81 |     ms |
|  50th percentile service time | date_histo_histo |     2200.06 |     ms |
|  90th percentile service time | date_histo_histo |     2222.01 |     ms |
| 100th percentile service time | date_histo_histo |     2244.96 |     ms |
|                    error rate | date_histo_histo |           0 |      % |

This saves some memory when the `histogram` aggregation is not a top level aggregation by dropping `asMultiBucketAggregator` in favor of natively implementing multi-bucket storage in the aggregator. For the most part this just uses the `LongKeyedBucketOrds` that we built the first time we did this.

…7377) This saves some memory when the `histogram` aggregation is not a top level aggregation by dropping `asMultiBucketAggregator` in favor of natively implementing multi-bucket storage in the aggregator. For the most part this just uses the `LongKeyedBucketOrds` that we built the first time we did this.

nik9000 added >enhancement :Analytics/Aggregations Aggregations v8.0.0 v7.9.0 labels May 28, 2020

nik9000 requested a review from talevy May 28, 2020 12:27

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 28, 2020

nik9000 added 2 commits May 28, 2020 09:03

Test

c502779

explain

8898048

nik9000 commented May 28, 2020

View reviewed changes

nik9000 mentioned this pull request May 28, 2020

Multi-bucket aggregator wrapper is slow and uses a ton of memory #56487

Closed

16 tasks

Merge branch 'master' into histo_no_multi

1efc384

talevy approved these changes May 28, 2020

View reviewed changes

nik9000 merged commit 460b204 into elastic:master May 29, 2020

nik9000 added the backport pending label May 29, 2020

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 29, 2020

Update skip after backport of elastic#57277

84a5556

nik9000 added a commit that referenced this pull request May 29, 2020

Update skip after backport of #57277 (#57379)

27bff25

nik9000 removed the backport pending label Jul 8, 2021

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save memory when histogram agg is not on top#57277

Save memory when histogram agg is not on top#57277
nik9000 merged 4 commits intoelastic:masterfrom
nik9000:histo_no_multi

nik9000 commented May 28, 2020

Uh oh!

elasticmachine commented May 28, 2020

Uh oh!

nik9000 commented May 28, 2020

Uh oh!

nik9000 May 28, 2020

Uh oh!

nik9000 May 28, 2020

Uh oh!

nik9000 May 28, 2020

Uh oh!

nik9000 May 28, 2020

Uh oh!

talevy May 28, 2020

Uh oh!

nik9000 commented May 29, 2020

Uh oh!

nik9000 commented May 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nik9000 commented May 28, 2020

Uh oh!

elasticmachine commented May 28, 2020

Uh oh!

nik9000 commented May 28, 2020

Uh oh!

nik9000 May 28, 2020

Choose a reason for hiding this comment

Uh oh!

nik9000 May 28, 2020

Choose a reason for hiding this comment

Uh oh!

nik9000 May 28, 2020

Choose a reason for hiding this comment

Uh oh!

nik9000 May 28, 2020

Choose a reason for hiding this comment

Uh oh!

talevy May 28, 2020

Choose a reason for hiding this comment

Uh oh!

nik9000 commented May 29, 2020

Uh oh!

nik9000 commented May 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants