Stop terms agg from losing buckets#70493
Merged
nik9000 merged 4 commits intoelastic:masterfrom Mar 22, 2021
Merged
Conversation
Collaborator
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
When the `terms` agg is at the top level it can run as a `filters` agg instead because that is typically faster. This was added in elastic#68871 and we mistakely made it so that a bucket without any hits could take up a slot on the way back to the coordinating node. You could trigger this by having a fairly precise `size` on the terms agg and a top level filter. This fixes the issue by properly mimicing the regular terms aggregator in the "as filters" version: only send back buckets without any matching documents if the min_doc_count is 0. Closes elastic#70449
nik9000
commented
Mar 17, 2021
| long minDocCount = bucketCountThresholds.getShardMinDocCount(); | ||
| if (minDocCount == 0 && bucketCountThresholds.getMinDocCount() > 0) { | ||
| minDocCount = 1; | ||
| } |
Member
Author
There was a problem hiding this comment.
Another, maybe better way to fix this would be to update the default shardMinDocCount to 1 unless minDocCount is 0. I believe that is effectively what is going on in the other aggregators sort of by accident. But I'm worried that making that change would be breaky.
Member
There was a problem hiding this comment.
I think making it explicit is the right thing to do long term, but not urgent. Would you open an issue for it so we don't might not forget, please?
not-napoleon
approved these changes
Mar 22, 2021
| long minDocCount = bucketCountThresholds.getShardMinDocCount(); | ||
| if (minDocCount == 0 && bucketCountThresholds.getMinDocCount() > 0) { | ||
| minDocCount = 1; | ||
| } |
Member
There was a problem hiding this comment.
I think making it explicit is the right thing to do long term, but not urgent. Would you open an issue for it so we don't might not forget, please?
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this pull request
Mar 22, 2021
When the `terms` agg is at the top level it can run as a `filters` agg instead because that is typically faster. This was added in elastic#68871 and we mistakely made it so that a bucket without any hits could take up a slot on the way back to the coordinating node. You could trigger this by having a fairly precise `size` on the terms agg and a top level filter. This fixes the issue by properly mimicing the regular terms aggregator in the "as filters" version: only send back buckets without any matching documents if the min_doc_count is 0. Closes elastic#70449
nik9000
added a commit
that referenced
this pull request
Mar 22, 2021
When the `terms` agg is at the top level it can run as a `filters` agg instead because that is typically faster. This was added in #68871 and we mistakely made it so that a bucket without any hits could take up a slot on the way back to the coordinating node. You could trigger this by having a fairly precise `size` on the terms agg and a top level filter. This fixes the issue by properly mimicing the regular terms aggregator in the "as filters" version: only send back buckets without any matching documents if the min_doc_count is 0. Closes #70449
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this pull request
Mar 22, 2021
nik9000
added a commit
that referenced
this pull request
Mar 22, 2021
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When the
termsagg is at the top level it can run as afiltersagginstead because that is typically faster. This was added in #68871 and
we mistakenly made it so that a bucket without any hits could take up a
slot on the way back to the coordinating node. You could trigger this by
having a fairly precise
sizeon the terms agg and a top level filter.This fixes the issue by properly mimicing the regular terms aggregator
in the "as filters" version: only send back buckets without any matching
documents if the min_doc_count is 0.
Closes #70449