Speed up terms agg when not force merged (backport of #71241)#71743
Merged
nik9000 merged 2 commits intoelastic:7.xfrom Apr 15, 2021
Merged
Speed up terms agg when not force merged (backport of #71241)#71743nik9000 merged 2 commits intoelastic:7.xfrom
nik9000 merged 2 commits intoelastic:7.xfrom
Conversation
This speeds up the `terms` aggregation when it can't take the fancy `filters` path, there is more than one segment, and any of those segments have only a single value for the field. These three things are super common. Here are the performance change numbers: ``` | 50th percentile latency | date-histo-string-terms-via-global-ords | 3414.02 | 2632.01 | -782.015 | ms | | 90th percentile latency | date-histo-string-terms-via-global-ords | 3470.91 | 2756.88 | -714.031 | ms | | 100th percentile latency | date-histo-string-terms-via-global-ords | 3620.89 | 2875.79 | -745.102 | ms | | 50th percentile service time | date-histo-string-terms-via-global-ords | 3410.15 | 2628.87 | -781.275 | ms | | 90th percentile service time | date-histo-string-terms-via-global-ords | 3467.36 | 2752.43 | -714.933 | ms | 20%!!!! | 100th percentile service time | date-histo-string-terms-via-global-ords | 3617.71 | 2871.63 | -746.083 | ms | ``` This works by hooking global ordinals into `DocValues.unwrapSingleton`. Without this you could unwrap singletons *if* the segment's ordinals aligned exactly with the global ordinals. If they didn't we'd return an doc values iterator that you can't unwrap. Even if the segment ordinals were singletons. That speeds up the terms aggregator because we have a fast path we can take if we have singletons. It was previously only working if we had a single segment. Or if the segment's ordinals lined up exactly. Which, for low cardinality fields is fairly common. So they might not benefit from this quite as much as high cardinality fields. Closes elastic#71086
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This speeds up the
termsaggregation when it can't take the fancyfilterspath, there is more than one segment, and any of thosesegments have only a single value for the field. These three things are
super common.
Here are the performance change numbers:
This works by hooking global ordinals into
DocValues.unwrapSingleton.Without this you could unwrap singletons if the segment's ordinals
aligned exactly with the global ordinals. If they didn't we'd return an
doc values iterator that you can't unwrap. Even if the segment ordinals
were singletons.
That speeds up the terms aggregator because we have a fast path we can
take if we have singletons. It was previously only working if we had a
single segment. Or if the segment's ordinals lined up exactly. Which,
for low cardinality fields is fairly common. So they might not benefit
from this quite as much as high cardinality fields.
Closes #71086