Skip to content

Stop terms agg from losing buckets#70493

Merged
nik9000 merged 4 commits intoelastic:masterfrom
nik9000:terms_as_filters_precise_size
Mar 22, 2021
Merged

Stop terms agg from losing buckets#70493
nik9000 merged 4 commits intoelastic:masterfrom
nik9000:terms_as_filters_precise_size

Conversation

@nik9000
Copy link
Copy Markdown
Member

@nik9000 nik9000 commented Mar 17, 2021

When the terms agg is at the top level it can run as a filters agg
instead because that is typically faster. This was added in #68871 and
we mistakenly made it so that a bucket without any hits could take up a
slot on the way back to the coordinating node. You could trigger this by
having a fairly precise size on the terms agg and a top level filter.

This fixes the issue by properly mimicing the regular terms aggregator
in the "as filters" version: only send back buckets without any matching
documents if the min_doc_count is 0.

Closes #70449

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 17, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

When the `terms` agg is at the top level it can run as a `filters` agg
instead because that is typically faster. This was added in elastic#68871 and
we mistakely made it so that a bucket without any hits could take up a
slot on the way back to the coordinating node. You could trigger this by
having a fairly precise `size` on the terms agg and a top level filter.

This fixes the issue by properly mimicing the regular terms aggregator
in the "as filters" version: only send back buckets without any matching
documents if the min_doc_count is 0.

Closes elastic#70449
long minDocCount = bucketCountThresholds.getShardMinDocCount();
if (minDocCount == 0 && bucketCountThresholds.getMinDocCount() > 0) {
minDocCount = 1;
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another, maybe better way to fix this would be to update the default shardMinDocCount to 1 unless minDocCount is 0. I believe that is effectively what is going on in the other aggregators sort of by accident. But I'm worried that making that change would be breaky.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making it explicit is the right thing to do long term, but not urgent. Would you open an issue for it so we don't might not forget, please?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #70664

@nik9000 nik9000 requested a review from not-napoleon March 17, 2021 19:41
Copy link
Copy Markdown
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

long minDocCount = bucketCountThresholds.getShardMinDocCount();
if (minDocCount == 0 && bucketCountThresholds.getMinDocCount() > 0) {
minDocCount = 1;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making it explicit is the right thing to do long term, but not urgent. Would you open an issue for it so we don't might not forget, please?

@nik9000 nik9000 merged commit 8e6d478 into elastic:master Mar 22, 2021
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Mar 22, 2021
When the `terms` agg is at the top level it can run as a `filters` agg
instead because that is typically faster. This was added in elastic#68871 and
we mistakely made it so that a bucket without any hits could take up a
slot on the way back to the coordinating node. You could trigger this by
having a fairly precise `size` on the terms agg and a top level filter.

This fixes the issue by properly mimicing the regular terms aggregator
in the "as filters" version: only send back buckets without any matching
documents if the min_doc_count is 0.

Closes elastic#70449
nik9000 added a commit that referenced this pull request Mar 22, 2021
When the `terms` agg is at the top level it can run as a `filters` agg
instead because that is typically faster. This was added in #68871 and
we mistakely made it so that a bucket without any hits could take up a
slot on the way back to the coordinating node. You could trigger this by
having a fairly precise `size` on the terms agg and a top level filter.

This fixes the issue by properly mimicing the regular terms aggregator
in the "as filters" version: only send back buckets without any matching
documents if the min_doc_count is 0.

Closes #70449
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Mar 22, 2021
nik9000 added a commit that referenced this pull request Mar 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.13.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

terms aggs order messing up the results

4 participants