Skip to content

Prevent histogram from allocating tons of buckets#71758

Merged
nik9000 merged 11 commits intoelastic:masterfrom
nik9000:histogram_no_crash
Apr 20, 2021
Merged

Prevent histogram from allocating tons of buckets#71758
nik9000 merged 11 commits intoelastic:masterfrom
nik9000:histogram_no_crash

Conversation

@nik9000
Copy link
Copy Markdown
Member

@nik9000 nik9000 commented Apr 15, 2021

This prevents the histogram aggregation from allocating tons of empty
buckets when you set the interval to something tiny. Instead, we
reject the request. We're not in a place where we can aggregate over
huge ranges with tiny intervals, but we should fail gracefully when you
ask us to do so rather than OOM.

Closes #71744

@nik9000 nik9000 requested a review from imotov April 15, 2021 17:25
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 15, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Apr 15, 2021

I suspect a similar bug hits date_histogram. But I can't be sure. If we like this solution here I can check it out for date_histogram as well.

nik9000 added 5 commits April 15, 2021 13:29
This prevents the `histogram` aggregation from allocating tons of empty
buckets when you set the `interval` to something tiny. Instead, we
reject the request. We're not in a place where we can aggregate over
huge ranges with tiny intervals, but we should fail gracefully when you
ask us to do so rather than OOM.

Closes elastic#71744
@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Apr 15, 2021

run elasticsearch-ci/2

1 similar comment
@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Apr 15, 2021

run elasticsearch-ci/2

Copy link
Copy Markdown
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

}
}
Counter counter = new Counter();
iterateEmptyBuckets(list, list.listIterator(), counter);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I followed this correctly, this pass doesn't actually allocate the buckets, it just runs the maybe break check based on how many buckets we would allocate, and the call on line 371 does the actual allocation. Presuming I followed that correctly, I think it's worth leaving a comment to that effect. That seems to be the core of this fix, and it'd be good to remember why we're doing it this way.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to do that.

@imotov I'd love to hear from you too on this PR. To get two green check marks on this one.

Copy link
Copy Markdown
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a minor suggestion

@Override
public void accept(double key) {
size++;
if (size >= 10000) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this constant really needs an origin story, otherwise it makes the reader to wonder where it came from and if it is really meaningful. I wonder if it would make more sense to do something like DEFAULT_MAX_BUCKETS/10 with a brief explanation why this should work?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should work, yeah. I'll leave a nice comment. We could just keep adding and never call the method but that'd make failures slow. An origin story it is!

@nik9000 nik9000 merged commit cf8d56a into elastic:master Apr 20, 2021
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Apr 20, 2021
…c#71758)

This prevents the `histogram` aggregation from allocating tons of empty
buckets when you set the `interval` to something tiny. Instead, we
reject the request. We're not in a place where we can aggregate over
huge ranges with tiny intervals, but we should fail gracefully when you
ask us to do so rather than OOM.

Closes elastic#71744
nik9000 added a commit that referenced this pull request Apr 21, 2021
#71986)

This prevents the `histogram` aggregation from allocating tons of empty
buckets when you set the `interval` to something tiny. Instead, we
reject the request. We're not in a place where we can aggregate over
huge ranges with tiny intervals, but we should fail gracefully when you
ask us to do so rather than OOM.

Closes #71744
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Apr 21, 2021
Now that elastic#71758 has landed in 7.x we don't have to skip its tests when
running backwards compatibility tests.
nik9000 added a commit that referenced this pull request Apr 21, 2021
Now that #71758 has landed in 7.x we don't have to skip its tests when
running backwards compatibility tests.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Apr 22, 2021
This prevents the `date_histogram` from running out of memory allocating
empty buckets when you set the interval to something tiny like `seconds`
and aggregate over a very wide date range. Without this change we'd
allocate memory very quickly and throw and out of memory error, taking
down the node. With it we instead throw the standard "too many buckets"
error.

Relates to elastic#71758
nik9000 added a commit that referenced this pull request Apr 27, 2021
This prevents the `date_histogram` from running out of memory allocating
empty buckets when you set the interval to something tiny like `seconds`
and aggregate over a very wide date range. Without this change we'd
allocate memory very quickly and throw and out of memory error, taking
down the node. With it we instead throw the standard "too many buckets"
error.

Relates to #71758
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Apr 27, 2021
This prevents the `date_histogram` from running out of memory allocating
empty buckets when you set the interval to something tiny like `seconds`
and aggregate over a very wide date range. Without this change we'd
allocate memory very quickly and throw and out of memory error, taking
down the node. With it we instead throw the standard "too many buckets"
error.

Relates to elastic#71758
nik9000 added a commit that referenced this pull request Apr 28, 2021
This prevents the `date_histogram` from running out of memory allocating
empty buckets when you set the interval to something tiny like `seconds`
and aggregate over a very wide date range. Without this change we'd
allocate memory very quickly and throw and out of memory error, taking
down the node. With it we instead throw the standard "too many buckets"
error.

Relates to #71758
lchqlchq pushed a commit to lchqlchq/elasticsearch that referenced this pull request Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.13.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Histogram aggregation run with very small interval crashes Elasticsearch

5 participants