Do not generate empty buckets for the date histogram#89070
Merged
salvatore-campagna merged 5 commits intoelastic:mainfrom Aug 8, 2022
Merged
Conversation
If the date histogram interval is large and the 'fixed_interval' parameter is very small we might end up with a large number of buckets in the resulting histogram, in case we also generate empty buckets. As a result of this we might generate too many buckets (max date - min date) / fixed_interval > 65536 (roughly).. Here we set minDocCount to 1 so to avoid generation of empty buckets. In the test the maximum value for 'docCount' is 9000 which means, in the worsta case we generate 9000 documents, each belonging to a different bucket. In this case we would have 9000 buckets maximum which is well below the default maximum number of buckets allowed by default.
Collaborator
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
Collaborator
|
Hi @salvatore-campagna, I've created a changelog YAML for you. |
Contributor
Author
|
I tested this running the test locally "until failure". After more than 1000 executions I don't see any failure. Before the patch I could see the failure fairly quickly after a few tens executions (depending on random values). |
Contributor
Author
|
@elasticsearchmachine update branch |
Contributor
Author
|
@elasticsearchmachine update branch |
csoulios
approved these changes
Aug 3, 2022
Contributor
csoulios
left a comment
There was a problem hiding this comment.
LGTM!
I left a comment about labelling this PR. You do not need another review after fixing this
Contributor
Author
|
@elasticsearchmachine test this please |
Contributor
Author
|
@elasticsearchmachine update branch |
Contributor
Author
|
@elasticsearchmachine run elasticsearch-ci/part-2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
If the date histogram interval is large and the 'fixed_interval'
parameter is very small we might end up with a large number of
buckets in the resulting histogram, in case we also generate empty
buckets. Roughly (max date - min date) / fixed_interval > 65536.
Here we set minDocCount to 1 so to avoid generation of empty buckets.
In the test the maximum value for 'docCount' is 9000 which means,
in the worst case, we generate 9000 documents, each belonging to a
different bucket. In the worst case we would have 9000 buckets
maximum which is well below the maximum number of buckets
allowed by default (65536).
Resolves #88800.