Skip to content

Do not generate empty buckets for the date histogram#89070

Merged
salvatore-campagna merged 5 commits intoelastic:mainfrom
salvatore-campagna:fix/88800-date-histogram-too-many-buckets
Aug 8, 2022
Merged

Do not generate empty buckets for the date histogram#89070
salvatore-campagna merged 5 commits intoelastic:mainfrom
salvatore-campagna:fix/88800-date-histogram-too-many-buckets

Conversation

@salvatore-campagna
Copy link
Copy Markdown
Contributor

@salvatore-campagna salvatore-campagna commented Aug 3, 2022

If the date histogram interval is large and the 'fixed_interval'
parameter is very small we might end up with a large number of
buckets in the resulting histogram, in case we also generate empty
buckets. Roughly (max date - min date) / fixed_interval > 65536.

Here we set minDocCount to 1 so to avoid generation of empty buckets.
In the test the maximum value for 'docCount' is 9000 which means,
in the worst case, we generate 9000 documents, each belonging to a
different bucket. In the worst case we would have 9000 buckets
maximum which is well below the maximum number of buckets
allowed by default (65536).

Resolves #88800.

If the date histogram interval is large and the 'fixed_interval'
parameter is very small we might end up with a large number of
buckets in the resulting histogram, in case we also generate empty
buckets. As a result of this we might generate too many buckets
(max date - min date) / fixed_interval > 65536 (roughly)..

Here we set minDocCount to 1 so to avoid generation of empty buckets.
In the test the maximum value for 'docCount' is 9000 which means,
in the worsta case we generate 9000 documents, each belonging to a
different bucket. In this case we would have 9000 buckets maximum
which is well below the default maximum number of buckets allowed by
default.
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v8.5.0 labels Aug 3, 2022
@salvatore-campagna salvatore-campagna added v8.4.0 :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data :StorageEngine/TSDB You know, for Metrics >bug and removed needs:triage Requires assignment of a team area label v8.5.0 labels Aug 3, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 3, 2022
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @salvatore-campagna, I've created a changelog YAML for you.

@salvatore-campagna
Copy link
Copy Markdown
Contributor Author

I tested this running the test locally "until failure". After more than 1000 executions I don't see any failure. Before the patch I could see the failure fairly quickly after a few tens executions (depending on random values).

@salvatore-campagna
Copy link
Copy Markdown
Contributor Author

@elasticsearchmachine update branch

@salvatore-campagna
Copy link
Copy Markdown
Contributor Author

@elasticsearchmachine update branch

Copy link
Copy Markdown
Contributor

@csoulios csoulios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I left a comment about labelling this PR. You do not need another review after fixing this

@salvatore-campagna salvatore-campagna added >test-failure Triaged test failures from CI and removed >bug labels Aug 3, 2022
@salvatore-campagna
Copy link
Copy Markdown
Contributor Author

@elasticsearchmachine test this please

@salvatore-campagna salvatore-campagna changed the title fix: do not generate empty buckets for the date histogram Do not generate empty buckets for the date histogram Aug 4, 2022
@salvatore-campagna
Copy link
Copy Markdown
Contributor Author

@elasticsearchmachine update branch

@salvatore-campagna
Copy link
Copy Markdown
Contributor Author

@elasticsearchmachine run elasticsearch-ci/part-2

@salvatore-campagna salvatore-campagna merged commit 36c4a17 into elastic:main Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test-failure Triaged test failures from CI v8.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] RollupActionSingleNodeTests testRollupDatastream failing

3 participants