Skip to content

Optimize lone single bucket date_histogram#71180

Merged
nik9000 merged 16 commits intoelastic:masterfrom
nik9000:field_exists_speedy
May 12, 2021
Merged

Optimize lone single bucket date_histogram#71180
nik9000 merged 16 commits intoelastic:masterfrom
nik9000:field_exists_speedy

Conversation

@nik9000
Copy link
Copy Markdown
Member

@nik9000 nik9000 commented Apr 1, 2021

This optimizes the date_histogram agg when there is a single bucket
and no sub-aggregations. We expect this to happen from time to time when
the buckets are larger than a day because folks often use "daily"
indices.

This was already fairly fast, but using the metadata makes it 10x
faster. Something like 98ms becomes 7.5ms. Nice if you can get it!

Like #69377 this optimization will disable itself if you have document
level security enabled or are querying a rollup index. Also like #69377
it won't do anything if there is a top level query.

nik9000 added 3 commits March 25, 2021 17:00
This restores SQL's test for fetching `half_floats` after we backported
the precision change in that fetch (elastic#70653)
This optimizes the `date_histogram` agg when there is a single bucket
and no sub-aggregations. We expect this to happen from time to time when
the buckets are larger than a day because folks often use "daily"
indices.

This was already fairly fast, but using the metadata makes it 10x
faster. Something like 98ms becomes 7.5ms. Nice if you can get it!

Like elastic#69377 this optimization will disable itself if you have document
level security enabled or are querying a rollup index. Also like elastic#69377
it won't do anything if there is a top level query.
@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Apr 1, 2021

|                Min Throughput |     5.0 |     5.00816 |  0.00372 |  ops/s |
|               Mean Throughput |     5.0 |     5.01744 |  0.00769 |  ops/s |
|             Median Throughput |     5.0 |     5.01392 |  0.00616 |  ops/s |
|                Max Throughput |     5.0 |     5.04775 |  0.02114 |  ops/s |
|       50th percentile latency |    92.8 |     7.15122 | -85.7464 |     ms |
|       90th percentile latency |    99.5 |     8.68909 | -90.8784 |     ms |
|       99th percentile latency |   111.8 |     10.6733 | -101.137 |     ms |
|      100th percentile latency |   132.6 |     20.7481 | -111.862 |     ms |
|  50th percentile service time |    91.7 |     5.96916 | -85.8158 |     ms |
|  90th percentile service time |    98.3 |     7.49015 | -90.8677 |     ms |
|  99th percentile service time |   110.8 |     9.28727 | -101.609 |     ms |
| 100th percentile service time |   131.1 |     19.7961 | -111.374 |     ms |
|                    error rate |      0  |           0 |        0 |      % |

@nik9000 nik9000 marked this pull request as ready for review April 14, 2021 20:25
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 14, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Apr 15, 2021

run elasticsearch-ci/1

Copy link
Copy Markdown
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted, I think future us will appreciate a little more documentation, but otherwise this looks fine. +1

if (query instanceof TermQuery) {
return new TermQueryToFilterAdapter(searcher, key, (TermQuery) query);
}
if (query instanceof ConstantScoreQuery) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like an obvious question "Why don't we check for a wrapped TermsQuery or MatchAllDocsQuery?" Would be good to have a comment to answer that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I've not seen it come up. But, looking at it with fresh eyes now, I think the safest thing is to always unwrap.

}
}

private static class DocValuesFieldExistsAdapter extends QueryToFilterAdapter<DocValuesFieldExistsQuery> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, keeping all the implementations as static inner classes is going to get unwieldy. Do you think we should refactor QueryToFilterAdapeter into its own package and make these static inners top level package private classes?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.... I'll see about breaking them out in a mechanical follow up PR.

@nik9000 nik9000 merged commit 0cf63fc into elastic:master May 12, 2021
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 12, 2021
This optimizes the `date_histogram` agg when there is a single bucket
and no sub-aggregations. We expect this to happen from time to time when
the buckets are larger than a day because folks often use "daily"
indices.

This was already fairly fast, but using the metadata makes it 10x
faster. Something like 98ms becomes 7.5ms. Nice if you can get it!

Like elastic#69377 this optimization will disable itself if you have document
level security enabled or are querying a rollup index. Also like elastic#69377
it won't do anything if there is a top level query.
nik9000 added a commit that referenced this pull request May 12, 2021
…2989)

This optimizes the `date_histogram` agg when there is a single bucket
and no sub-aggregations. We expect this to happen from time to time when
the buckets are larger than a day because folks often use "daily"
indices.

This was already fairly fast, but using the metadata makes it 10x
faster. Something like 98ms becomes 7.5ms. Nice if you can get it!

Like #69377 this optimization will disable itself if you have document
level security enabled or are querying a rollup index. Also like #69377
it won't do anything if there is a top level query.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.14.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants