Prevent histogram from allocating tons of buckets by nik9000 · Pull Request #71758 · elastic/elasticsearch

nik9000 · 2021-04-15T17:25:47Z

This prevents the histogram aggregation from allocating tons of empty
buckets when you set the interval to something tiny. Instead, we
reject the request. We're not in a place where we can aggregate over
huge ranges with tiny intervals, but we should fail gracefully when you
ask us to do so rather than OOM.

Closes #71744

elasticmachine · 2021-04-15T17:25:51Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

nik9000 · 2021-04-15T17:29:25Z

I suspect a similar bug hits date_histogram. But I can't be sure. If we like this solution here I can check it out for date_histogram as well.

This prevents the `histogram` aggregation from allocating tons of empty buckets when you set the `interval` to something tiny. Instead, we reject the request. We're not in a place where we can aggregate over huge ranges with tiny intervals, but we should fail gracefully when you ask us to do so rather than OOM. Closes elastic#71744

nik9000 · 2021-04-15T19:33:23Z

run elasticsearch-ci/2

nik9000 · 2021-04-15T19:54:31Z

run elasticsearch-ci/2

not-napoleon

LGTM.

not-napoleon · 2021-04-19T14:19:30Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/histogram/InternalHistogram.java

+            }
+        }
+        Counter counter = new Counter();
+        iterateEmptyBuckets(list, list.listIterator(), counter);


If I followed this correctly, this pass doesn't actually allocate the buckets, it just runs the maybe break check based on how many buckets we would allocate, and the call on line 371 does the actual allocation. Presuming I followed that correctly, I think it's worth leaving a comment to that effect. That seems to be the core of this fix, and it'd be good to remember why we're doing it this way.

I'm happy to do that.

@imotov I'd love to hear from you too on this PR. To get two green check marks on this one.

imotov

LGTM. Left a minor suggestion

imotov · 2021-04-19T16:45:26Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/histogram/InternalHistogram.java

+            @Override
+            public void accept(double key) {
+                size++;
+                if (size >= 10000) {


I think this constant really needs an origin story, otherwise it makes the reader to wonder where it came from and if it is really meaningful. I wonder if it would make more sense to do something like DEFAULT_MAX_BUCKETS/10 with a brief explanation why this should work?

That should work, yeah. I'll leave a nice comment. We could just keep adding and never call the method but that'd make failures slow. An origin story it is!

…c#71758) This prevents the `histogram` aggregation from allocating tons of empty buckets when you set the `interval` to something tiny. Instead, we reject the request. We're not in a place where we can aggregate over huge ranges with tiny intervals, but we should fail gracefully when you ask us to do so rather than OOM. Closes elastic#71744

#71986) This prevents the `histogram` aggregation from allocating tons of empty buckets when you set the `interval` to something tiny. Instead, we reject the request. We're not in a place where we can aggregate over huge ranges with tiny intervals, but we should fail gracefully when you ask us to do so rather than OOM. Closes #71744

Now that elastic#71758 has landed in 7.x we don't have to skip its tests when running backwards compatibility tests.

Now that #71758 has landed in 7.x we don't have to skip its tests when running backwards compatibility tests.

This prevents the `date_histogram` from running out of memory allocating empty buckets when you set the interval to something tiny like `seconds` and aggregate over a very wide date range. Without this change we'd allocate memory very quickly and throw and out of memory error, taking down the node. With it we instead throw the standard "too many buckets" error. Relates to elastic#71758

This prevents the `date_histogram` from running out of memory allocating empty buckets when you set the interval to something tiny like `seconds` and aggregate over a very wide date range. Without this change we'd allocate memory very quickly and throw and out of memory error, taking down the node. With it we instead throw the standard "too many buckets" error. Relates to #71758

This prevents the `date_histogram` from running out of memory allocating empty buckets when you set the interval to something tiny like `seconds` and aggregate over a very wide date range. Without this change we'd allocate memory very quickly and throw and out of memory error, taking down the node. With it we instead throw the standard "too many buckets" error. Relates to elastic#71758

This prevents the `date_histogram` from running out of memory allocating empty buckets when you set the interval to something tiny like `seconds` and aggregate over a very wide date range. Without this change we'd allocate memory very quickly and throw and out of memory error, taking down the node. With it we instead throw the standard "too many buckets" error. Relates to #71758

nik9000 added >bug :Analytics/Aggregations Aggregations v8.0.0 v7.13.0 labels Apr 15, 2021

nik9000 requested a review from imotov April 15, 2021 17:25

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 15, 2021

nik9000 added 5 commits April 15, 2021 13:29

Good job test! I did a backwards

53957e5

Skip failure

cd0fdea

Merge branch 'master' into histogram_no_crash

09eb86d

Come on

2cca996

not-napoleon approved these changes Apr 19, 2021

View reviewed changes

imotov approved these changes Apr 19, 2021

View reviewed changes

nik9000 added 6 commits April 20, 2021 08:41

Merge branch 'master' into histogram_no_crash

f2c64d9

Docs

09d74b0

Backwards

15773b8

Merge branch 'master' into histogram_no_crash

43b2762

Merge branch 'master' into histogram_no_crash

e127d3c

Merge branch 'master' into histogram_no_crash

923f428

nik9000 merged commit cf8d56a into elastic:master Apr 20, 2021

nik9000 added the backport pending label Apr 20, 2021

nik9000 removed the backport pending label Apr 21, 2021

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Apr 21, 2021

Update skip after backport of elastic#71758

cac8f03

Now that elastic#71758 has landed in 7.x we don't have to skip its tests when running backwards compatibility tests.

nik9000 mentioned this pull request Apr 21, 2021

Update skip after backport of #71758 #72047

Merged

nik9000 added a commit that referenced this pull request Apr 21, 2021

Update skip after backport of #71758 (#72047)

f88fd37

Now that #71758 has landed in 7.x we don't have to skip its tests when running backwards compatibility tests.

nik9000 mentioned this pull request Apr 22, 2021

Prevent date_histogram from OOMing #72081

Merged

nik9000 mentioned this pull request Apr 27, 2021

Prevent date_histogram from OOMing (backport of #72081) #72328

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

lchqlchq pushed a commit to lchqlchq/elasticsearch that referenced this pull request Aug 23, 2023

Prevent histogram from allocating tons of buckets elastic#71758

f721628

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent histogram from allocating tons of buckets#71758

Prevent histogram from allocating tons of buckets#71758
nik9000 merged 11 commits intoelastic:masterfrom
nik9000:histogram_no_crash

nik9000 commented Apr 15, 2021

Uh oh!

elasticmachine commented Apr 15, 2021

Uh oh!

nik9000 commented Apr 15, 2021

Uh oh!

nik9000 commented Apr 15, 2021

Uh oh!

nik9000 commented Apr 15, 2021

Uh oh!

not-napoleon left a comment

Uh oh!

not-napoleon Apr 19, 2021

Uh oh!

nik9000 Apr 19, 2021

Uh oh!

imotov left a comment

Uh oh!

imotov Apr 19, 2021

Uh oh!

nik9000 Apr 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

nik9000 commented Apr 15, 2021

Uh oh!

elasticmachine commented Apr 15, 2021

Uh oh!

nik9000 commented Apr 15, 2021

Uh oh!

nik9000 commented Apr 15, 2021

Uh oh!

nik9000 commented Apr 15, 2021

Uh oh!

not-napoleon left a comment

Choose a reason for hiding this comment

Uh oh!

not-napoleon Apr 19, 2021

Choose a reason for hiding this comment

Uh oh!

nik9000 Apr 19, 2021

Choose a reason for hiding this comment

Uh oh!

imotov left a comment

Choose a reason for hiding this comment

Uh oh!

imotov Apr 19, 2021

Choose a reason for hiding this comment

Uh oh!

nik9000 Apr 19, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants