Skip to content

Fix doc_count on HistoBackedHistogramAggregator#74650

Merged
csoulios merged 1 commit intoelastic:masterfrom
csoulios:fix-histo-doc_count
Jun 28, 2021
Merged

Fix doc_count on HistoBackedHistogramAggregator#74650
csoulios merged 1 commit intoelastic:masterfrom
csoulios:fix-histo-doc_count

Conversation

@csoulios
Copy link
Copy Markdown
Contributor

@csoulios csoulios commented Jun 28, 2021

histogram aggregation on histogram field computes wrong doc_count values when _doc_count field is present.

The root cause of the problem is correctly described here

Closes #74617

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 28, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@csoulios csoulios requested a review from benwtrent June 28, 2021 19:10
// We have added the document already and we have incremented bucket doc_count
// by _doc_count times. To compensate for this, we should increment doc_count by
// (count - _doc_count) so that we have added it count times.
incrementBucketDocCount(bucketOrd, count - docCountProvider.getDocCount(doc));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No matter what _doc_count is, it was previously added to the bucket_count via the collectBucket methods. It could be ANY number.

Consequently, this incrementBucketDocCount may actually be decrementing the bucket_count to adjust for the difference.

I think this is fine.

The other potential solution is to override collectBucket... so the bucket count is not incremented. But that may prove too complicated.

I think this is good solution for now 👍

Somebody else from the aggs team should take a look as well.

@csoulios csoulios merged commit 49baa06 into elastic:master Jun 28, 2021
@csoulios csoulios deleted the fix-histo-doc_count branch June 28, 2021 19:55
csoulios added a commit that referenced this pull request Jun 29, 2021
Backports #74650 to v7.x

    histogram aggregation on histogram field computes wrong doc_count values when _doc_count field is present.

    The root cause of the problem is correctly described here
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.14.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Discrepancy between doc_count agg and histogram agg on histogram fields

5 participants