Save memory when auto_date_histogram is not on top (backport of #57304) by nik9000 · Pull Request #58190 · elastic/elasticsearch

nik9000 · 2020-06-16T19:20:06Z

This builds an auto_date_histogram aggregator that natively aggregates
from many buckets and uses it when the auto_date_histogram used to use
asMultiBucketAggregator which should save a significant amount of
memory in those cases. In particular, this happens when
auto_date_histogram is a sub-aggregator of a multi-bucketing aggregator
like terms or histogram or filters. For the most part we preserve
the original implementation when auto_date_histogram only collects from
a single bucket.

It isn't possible to "just port the aggregator" without taking a pretty
significant performance hit because we used to rewrite all of the
buckets every time we switched to a coarser and coarser rounding
configuration. Without some major surgery to how to delay sub-aggs
we'd end up rewriting the delay list zillions of time if there are many
buckets.

The multi-bucket version of the aggregator has a "budget" of "wasted"
buckets and only rewrites all of the buckets when we exceed that budget.
Now that we don't rebucket every time we increase the rounding we can no
longer get an accurate count of the number of buckets! So instead the
aggregator uses an estimate of the number of buckets to trigger switching
to a coarser rounding. This estimate is likely to be terrible when
buckets are far apart compared to the rounding. So it also uses the
difference between the first and last bucket to trigger switching to a
coarser rounding. Which covers for the shortcomings of the bucket
estimation technique pretty well. It also causes the aggregator to emit
fewer buckets in cases where they'd be reduced together on the
coordinating node. This is wonderful! But probably fairly rare.

All of that does buy us some speed improvements when the aggregator is
a child of multi-bucket aggregator:
Without metrics or time zone: 25% faster
With metrics: 15% faster
With time zone: 22% faster

Relates to #56487

…ic#57304) This builds an `auto_date_histogram` aggregator that natively aggregates from many buckets and uses it when the `auto_date_histogram` used to use `asMultiBucketAggregator` which should save a significant amount of memory in those cases. In particular, this happens when `auto_date_histogram` is a sub-aggregator of a multi-bucketing aggregator like `terms` or `histogram` or `filters`. For the most part we preserve the original implementation when `auto_date_histogram` only collects from a single bucket. It isn't possible to "just port the aggregator" without taking a pretty significant performance hit because we used to rewrite all of the buckets every time we switched to a coarser and coarser rounding configuration. Without some major surgery to how to delay sub-aggs we'd end up rewriting the delay list zillions of time if there are many buckets. The multi-bucket version of the aggregator has a "budget" of "wasted" buckets and only rewrites all of the buckets when we exceed that budget. Now that we don't rebucket every time we increase the rounding we can no longer get an accurate count of the number of buckets! So instead the aggregator uses an estimate of the number of buckets to trigger switching to a coarser rounding. This estimate is likely to be *terrible* when buckets are far apart compared to the rounding. So it also uses the difference between the first and last bucket to trigger switching to a coarser rounding. Which covers for the shortcomings of the bucket estimation technique pretty well. It also causes the aggregator to emit fewer buckets in cases where they'd be reduced together on the coordinating node. This is wonderful! But probably fairly rare. All of that does buy us some speed improvements when the aggregator is a child of multi-bucket aggregator: Without metrics or time zone: 25% faster With metrics: 15% faster With time zone: 22% faster Relates to elastic#56487

nik9000 added backport v7.9.0 labels Jun 16, 2020

Merge branch '7.x' into auto_date_histo_mem_7_x

3b47a9e

nik9000 merged commit ab2c6d9 into elastic:7.x Jun 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save memory when auto_date_histogram is not on top (backport of #57304)#58190

Save memory when auto_date_histogram is not on top (backport of #57304)#58190
nik9000 merged 2 commits intoelastic:7.xfrom
nik9000:auto_date_histo_mem_7_x

nik9000 commented Jun 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nik9000 commented Jun 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant