Speed up terms agg when alone (backport #69377) by nik9000 · Pull Request #69632 · elastic/elasticsearch

nik9000 · 2021-02-25T19:58:48Z

This speeds up the terms agg in a very specific case:

It has no child aggregations
It has no parent aggregations
There are no deleted documents
You are not using document level security
There is no top level query
The field has global ordinals
There are less than one thousand distinct terms

That is a lot of restirctions! But the speed up pretty substantial because
in those cases we can serve the entire aggregation using metadata that
lucene precomputes while it builds the index. In a real rally track we
have we get a 92% speed improvement, but the index isn't that big:

| 90th percentile service time | keyword-terms-low-cardinality |     446.031 |     36.7677 | -409.263 |     ms |

In a rally track with a larger index I ran some tests by hand and the
aggregation went from 2200ms to 8ms.

Even though there are 7 restrictions on this, I expect it to come into
play enough to matter. Restriction 6 just means you are aggregating on
a keyword field. Or an ip. And its fairly common for keywords to
have less than a thousand distinct values. Certainly not everywhere, but
some places.

I expect "cold tier" indices are very very likely not to have deleted
documents at all. And the optimization works segment by segment - so
it'll save some time on each segment without deleted documents. But more
time if the entire index doesn't have any.

The optimization builds on #68871 which translates terms aggregations
against low cardinality fields with global ordinals into a filters
aggregation. This teaches the filters aggregation to recognize when
it can get its results from the index metadata. Rather, it creates the
infrastructure to make that fairly simple and applies it in the case of
the queries generated by the terms aggregation.

This speeds up the `terms` agg in a very specific case: 1. It has no child aggregations 2. It has no parent aggregations 3. There are no deleted documents 4. You are not using document level security 5. There is no top level query 6. The field has global ordinals 7. There are less than one thousand distinct terms That is a lot of restirctions! But the speed up pretty substantial because in those cases we can serve the entire aggregation using metadata that lucene precomputes while it builds the index. In a real rally track we have we get a 92% speed improvement, but the index isn't *that* big: ``` | 90th percentile service time | keyword-terms-low-cardinality | 446.031 | 36.7677 | -409.263 | ms | ``` In a rally track with a larger index I ran some tests by hand and the aggregation went from 2200ms to 8ms. Even though there are 7 restrictions on this, I expect it to come into play enough to matter. Restriction 6 just means you are aggregating on a `keyword` field. Or an `ip`. And its fairly common for `keyword`s to have less than a thousand distinct values. Certainly not everywhere, but some places. I expect "cold tier" indices are very very likely not to have deleted documents at all. And the optimization works segment by segment - so it'll save some time on each segment without deleted documents. But more time if the entire index doesn't have any. The optimization builds on elastic#68871 which translates `terms` aggregations against low cardinality fields with global ordinals into a `filters` aggregation. This teaches the `filters` aggregation to recognize when it can get its results from the index metadata. Rather, it creates the infrastructure to make that fairly simple and applies it in the case of the queries generated by the terms aggregation.

nik9000 added backport v7.13.0 labels Feb 25, 2021

nik9000 merged commit c153abb into elastic:7.x Mar 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up terms agg when alone (backport #69377)#69632

Speed up terms agg when alone (backport #69377)#69632
nik9000 merged 1 commit intoelastic:7.xfrom
nik9000:filters_from_meta_7_x

nik9000 commented Feb 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nik9000 commented Feb 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant