Prevent histogram from allocating tons of buckets#71758
Prevent histogram from allocating tons of buckets#71758nik9000 merged 11 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
|
I suspect a similar bug hits |
This prevents the `histogram` aggregation from allocating tons of empty buckets when you set the `interval` to something tiny. Instead, we reject the request. We're not in a place where we can aggregate over huge ranges with tiny intervals, but we should fail gracefully when you ask us to do so rather than OOM. Closes elastic#71744
|
run elasticsearch-ci/2 |
1 similar comment
|
run elasticsearch-ci/2 |
| } | ||
| } | ||
| Counter counter = new Counter(); | ||
| iterateEmptyBuckets(list, list.listIterator(), counter); |
There was a problem hiding this comment.
If I followed this correctly, this pass doesn't actually allocate the buckets, it just runs the maybe break check based on how many buckets we would allocate, and the call on line 371 does the actual allocation. Presuming I followed that correctly, I think it's worth leaving a comment to that effect. That seems to be the core of this fix, and it'd be good to remember why we're doing it this way.
There was a problem hiding this comment.
I'm happy to do that.
@imotov I'd love to hear from you too on this PR. To get two green check marks on this one.
imotov
left a comment
There was a problem hiding this comment.
LGTM. Left a minor suggestion
| @Override | ||
| public void accept(double key) { | ||
| size++; | ||
| if (size >= 10000) { |
There was a problem hiding this comment.
I think this constant really needs an origin story, otherwise it makes the reader to wonder where it came from and if it is really meaningful. I wonder if it would make more sense to do something like DEFAULT_MAX_BUCKETS/10 with a brief explanation why this should work?
There was a problem hiding this comment.
That should work, yeah. I'll leave a nice comment. We could just keep adding and never call the method but that'd make failures slow. An origin story it is!
…c#71758) This prevents the `histogram` aggregation from allocating tons of empty buckets when you set the `interval` to something tiny. Instead, we reject the request. We're not in a place where we can aggregate over huge ranges with tiny intervals, but we should fail gracefully when you ask us to do so rather than OOM. Closes elastic#71744
#71986) This prevents the `histogram` aggregation from allocating tons of empty buckets when you set the `interval` to something tiny. Instead, we reject the request. We're not in a place where we can aggregate over huge ranges with tiny intervals, but we should fail gracefully when you ask us to do so rather than OOM. Closes #71744
Now that elastic#71758 has landed in 7.x we don't have to skip its tests when running backwards compatibility tests.
This prevents the `date_histogram` from running out of memory allocating empty buckets when you set the interval to something tiny like `seconds` and aggregate over a very wide date range. Without this change we'd allocate memory very quickly and throw and out of memory error, taking down the node. With it we instead throw the standard "too many buckets" error. Relates to elastic#71758
This prevents the `date_histogram` from running out of memory allocating empty buckets when you set the interval to something tiny like `seconds` and aggregate over a very wide date range. Without this change we'd allocate memory very quickly and throw and out of memory error, taking down the node. With it we instead throw the standard "too many buckets" error. Relates to #71758
This prevents the `date_histogram` from running out of memory allocating empty buckets when you set the interval to something tiny like `seconds` and aggregate over a very wide date range. Without this change we'd allocate memory very quickly and throw and out of memory error, taking down the node. With it we instead throw the standard "too many buckets" error. Relates to elastic#71758
This prevents the `date_histogram` from running out of memory allocating empty buckets when you set the interval to something tiny like `seconds` and aggregate over a very wide date range. Without this change we'd allocate memory very quickly and throw and out of memory error, taking down the node. With it we instead throw the standard "too many buckets" error. Relates to #71758
This prevents the
histogramaggregation from allocating tons of emptybuckets when you set the
intervalto something tiny. Instead, wereject the request. We're not in a place where we can aggregate over
huge ranges with tiny intervals, but we should fail gracefully when you
ask us to do so rather than OOM.
Closes #71744