Speed up terms agg when not force merged (backport of #71241) by nik9000 · Pull Request #71743 · elastic/elasticsearch

nik9000 · 2021-04-15T13:03:19Z

This speeds up the terms aggregation when it can't take the fancy
filters path, there is more than one segment, and any of those
segments have only a single value for the field. These three things are
super common.

Here are the performance change numbers:

|        50th percentile latency | date-histo-string-terms-via-global-ords | 3414.02 | 2632.01 | -782.015 | ms |
|        90th percentile latency | date-histo-string-terms-via-global-ords | 3470.91 | 2756.88 | -714.031 | ms |
|       100th percentile latency | date-histo-string-terms-via-global-ords | 3620.89 | 2875.79 | -745.102 | ms |
|   50th percentile service time | date-histo-string-terms-via-global-ords | 3410.15 | 2628.87 | -781.275 | ms |
|   90th percentile service time | date-histo-string-terms-via-global-ords | 3467.36 | 2752.43 | -714.933 | ms |   20%!!!!
|  100th percentile service time | date-histo-string-terms-via-global-ords | 3617.71 | 2871.63 | -746.083 | ms |

This works by hooking global ordinals into DocValues.unwrapSingleton.
Without this you could unwrap singletons if the segment's ordinals
aligned exactly with the global ordinals. If they didn't we'd return an
doc values iterator that you can't unwrap. Even if the segment ordinals
were singletons.

That speeds up the terms aggregator because we have a fast path we can
take if we have singletons. It was previously only working if we had a
single segment. Or if the segment's ordinals lined up exactly. Which,
for low cardinality fields is fairly common. So they might not benefit
from this quite as much as high cardinality fields.

Closes #71086

This speeds up the `terms` aggregation when it can't take the fancy `filters` path, there is more than one segment, and any of those segments have only a single value for the field. These three things are super common. Here are the performance change numbers: ``` | 50th percentile latency | date-histo-string-terms-via-global-ords | 3414.02 | 2632.01 | -782.015 | ms | | 90th percentile latency | date-histo-string-terms-via-global-ords | 3470.91 | 2756.88 | -714.031 | ms | | 100th percentile latency | date-histo-string-terms-via-global-ords | 3620.89 | 2875.79 | -745.102 | ms | | 50th percentile service time | date-histo-string-terms-via-global-ords | 3410.15 | 2628.87 | -781.275 | ms | | 90th percentile service time | date-histo-string-terms-via-global-ords | 3467.36 | 2752.43 | -714.933 | ms | 20%!!!! | 100th percentile service time | date-histo-string-terms-via-global-ords | 3617.71 | 2871.63 | -746.083 | ms | ``` This works by hooking global ordinals into `DocValues.unwrapSingleton`. Without this you could unwrap singletons *if* the segment's ordinals aligned exactly with the global ordinals. If they didn't we'd return an doc values iterator that you can't unwrap. Even if the segment ordinals were singletons. That speeds up the terms aggregator because we have a fast path we can take if we have singletons. It was previously only working if we had a single segment. Or if the segment's ordinals lined up exactly. Which, for low cardinality fields is fairly common. So they might not benefit from this quite as much as high cardinality fields. Closes elastic#71086

nik9000 added backport v7.13.0 labels Apr 15, 2021

List.of

87a16a7

nik9000 merged commit 475e2e9 into elastic:7.x Apr 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up terms agg when not force merged (backport of #71241)#71743

Speed up terms agg when not force merged (backport of #71241)#71743
nik9000 merged 2 commits intoelastic:7.xfrom
nik9000:investigate_71086_7_x

nik9000 commented Apr 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nik9000 commented Apr 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant