Fix doc_count on HistoBackedHistogramAggregator#74650
Merged
csoulios merged 1 commit intoelastic:masterfrom Jun 28, 2021
Merged
Fix doc_count on HistoBackedHistogramAggregator#74650csoulios merged 1 commit intoelastic:masterfrom
csoulios merged 1 commit intoelastic:masterfrom
Conversation
Collaborator
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
nik9000
approved these changes
Jun 28, 2021
benwtrent
approved these changes
Jun 28, 2021
| // We have added the document already and we have incremented bucket doc_count | ||
| // by _doc_count times. To compensate for this, we should increment doc_count by | ||
| // (count - _doc_count) so that we have added it count times. | ||
| incrementBucketDocCount(bucketOrd, count - docCountProvider.getDocCount(doc)); |
Member
There was a problem hiding this comment.
No matter what _doc_count is, it was previously added to the bucket_count via the collectBucket methods. It could be ANY number.
Consequently, this incrementBucketDocCount may actually be decrementing the bucket_count to adjust for the difference.
I think this is fine.
The other potential solution is to override collectBucket... so the bucket count is not incremented. But that may prove too complicated.
I think this is good solution for now 👍
Somebody else from the aggs team should take a look as well.
This was referenced Jun 28, 2021
csoulios
added a commit
that referenced
this pull request
Jun 29, 2021
Backports #74650 to v7.x histogram aggregation on histogram field computes wrong doc_count values when _doc_count field is present. The root cause of the problem is correctly described here
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
histogramaggregation onhistogramfield computes wrongdoc_countvalues when_doc_countfield is present.The root cause of the problem is correctly described here
Closes #74617