Save memory when rare_terms is not on top by nik9000 · Pull Request #57948 · elastic/elasticsearch

nik9000 · 2020-06-10T20:28:41Z

This uses the optimization that we started making in #55873 for
rare_terms to save a bit of memory when that aggregation is not on the
top level.

This uses the optimization that we started making in elastic#55873 for `rare_terms` to save a bit of memory when that aggregation is not on the top level.

elasticmachine · 2020-06-10T20:28:43Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

polyfractal

Left a few comments, mostly around naming :) Only left them on the Long agg, but applies equally to String.

Otherwise LGTM :)

polyfractal · 2020-06-11T15:05:41Z

server/src/main/java/org/elasticsearch/common/util/SetBackedScalingCuckooFilter.java

 * Can definitively say if a member does not exist (no false negatives), but may say an item exists
 * when it does not (has false positives).  Similar in usage to a Bloom Filter.
- *
+ * <p>


I hate javadocs so much :( optimizing rendered readability while sacrificing IDE readability :(

Thanks for fixing this :)

polyfractal · 2020-06-11T15:41:25Z

...rc/main/java/org/elasticsearch/search/aggregations/bucket/terms/LongRareTermsAggregator.java

+        long keepCount = 0;
+        long[] mergeMap = new long[(int) bucketOrds.size()];
+        Arrays.fill(mergeMap, -1);
+        long size = 0;


Hmm, this is a bit confusingly named I think? Maybe currentOffset or something? Not sure, but size feels a bit confusing.

polyfractal · 2020-06-11T15:43:54Z

...rc/main/java/org/elasticsearch/search/aggregations/bucket/terms/LongRareTermsAggregator.java

+                        LongRareTerms.Bucket bucket = new LongRareTerms.Bucket(ordsEnum.value(), docCount, null, format);
+                        bucket.bucketOrd = mergeMap[(int) ordsEnum.ord()] = size + ordsToCollect.add(ordsEnum.value());
+                        buckets.add(bucket);
+                        keepCount++;


Should we just change this to a boolean flag? hasDeletions or whatever?

I think we need to perform the merge if we don't keep all the buckets. We can remove buckets for two reasons now!

The key is above the threshold.

The owningBucketOrd isn't selected.

This counter will catch both ways. I couldn't come up with a cleaner way to do it.

polyfractal · 2020-06-11T15:45:58Z

.../main/java/org/elasticsearch/search/aggregations/bucket/terms/StringRareTermsAggregator.java

+                // need to take care of dups
+                for (int i = 0; i < valuesCount; ++i) {
+                    BytesRef bytes = values.nextValue();
+                    if (filter != null && !filter.accept(bytes)) {


Nit: !filter.accept() :)

(Also I realize the irony since the original code had that and it was my fault :) )

polyfractal · 2020-06-11T15:54:03Z

...rc/main/java/org/elasticsearch/search/aggregations/bucket/terms/LongRareTermsAggregator.java

-                    // Make a note when one of the ords has been deleted
-                    deletionCount += 1;
-                    filter.add(oldKey);
+    public InternalAggregation[] buildAggregations(long[] owningBucketOrds) throws IOException {


General comment about this method: we have a lot of "ords" being referenced and it's hard to keep track of which ord is which. E.g. we have the bucket ordinals that our parent is requesting we build, and then we have the bucket ordinals from each of those instances that we are collecting into buckets

Not sure how, but if we could find a way to rename the variables to help identify or disambiguate I think it would help a bunch.

nik9000 · 2020-06-12T13:21:53Z

run elasticsearch-ci/default-distro

nik9000 · 2020-06-12T13:22:00Z

run elasticsearch-ci/1

nik9000

I'll see about cleaning up the "ords ords ords ords" stuff too.

nik9000 · 2020-06-12T13:48:00Z

...rc/main/java/org/elasticsearch/search/aggregations/bucket/terms/LongRareTermsAggregator.java

+                        LongRareTerms.Bucket bucket = new LongRareTerms.Bucket(ordsEnum.value(), docCount, null, format);
+                        bucket.bucketOrd = mergeMap[(int) ordsEnum.ord()] = size + ordsToCollect.add(ordsEnum.value());
+                        buckets.add(bucket);
+                        keepCount++;


I think we need to perform the merge if we don't keep all the buckets. We can remove buckets for two reasons now!

The key is above the threshold.

The owningBucketOrd isn't selected.

This counter will catch both ways. I couldn't come up with a cleaner way to do it.

nik9000 · 2020-06-12T13:48:19Z

.../main/java/org/elasticsearch/search/aggregations/bucket/terms/StringRareTermsAggregator.java

+                // need to take care of dups
+                for (int i = 0; i < valuesCount; ++i) {
+                    BytesRef bytes = values.nextValue();
+                    if (filter != null && !filter.accept(bytes)) {


nik9000 · 2020-06-12T20:23:24Z

Thanks @polyfractal !

This uses the optimization that we started making in elastic#55873 for `rare_terms` to save a bit of memory when that aggregation is not on the top level.

This uses the optimization that we started making in #55873 for `rare_terms` to save a bit of memory when that aggregation is not on the top level.

Same memory when rare_terms is not on top

f403c0c

This uses the optimization that we started making in elastic#55873 for `rare_terms` to save a bit of memory when that aggregation is not on the top level.

nik9000 added >enhancement :Analytics/Aggregations Aggregations v8.0.0 v7.9.0 labels Jun 10, 2020

nik9000 requested review from polyfractal and talevy June 10, 2020 20:28

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 10, 2020

nik9000 mentioned this pull request Jun 10, 2020

Multi-bucket aggregator wrapper is slow and uses a ton of memory #56487

Closed

16 tasks

$polyfractal$

polyfractal approved these changes Jun 11, 2020

View reviewed changes

Merge branch 'master' into rare_terms_mem

7c69d58

Merge branch 'master' into rare_terms_mem

fbf96ea

nik9000 commented Jun 12, 2020

View reviewed changes

Rename

280f727

nik9000 changed the title ~~Same memory when rare_terms is not on top~~ Save memory when rare_terms is not on top Jun 12, 2020

Merge branch 'master' into rare_terms_mem

34f3e11

nik9000 merged commit 933565d into elastic:master Jun 12, 2020

nik9000 added the backport pending label Jun 12, 2020

nik9000 removed the backport pending label Jun 12, 2020

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Conversation

nik9000 commented Jun 10, 2020

Uh oh!

elasticmachine commented Jun 10, 2020

Uh oh!

polyfractal left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Jun 12, 2020

Uh oh!

nik9000 commented Jun 12, 2020

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Jun 12, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

$@polyfractal$ polyfractal left a comment