Speed up terms agg when alone#69377
Conversation
This speeds up the `terms` agg in a very specific case: 1. It has no child aggregations 2. It has no parent aggregations 3. There are no deleted documents 4. You are not using document level security 5. There is no top level query 6. The field has global ordinals 7. There are less than one thousand distinct terms That is a lot of restirctions! But the speed up pretty substantial because in those cases we can serve the entire aggregation using metadata that lucene precomputes while it builds the index. In a real rally track we have we get a 92% speed improvement, but the index isn't *that* big: ``` | 90th percentile service time | keyword-terms-low-cardinality | 446.031 | 36.7677 | -409.263 | ms | ``` In a rally track with a larger index I ran some tests by hand and the aggregation went from 2200ms to 8ms. Even though there are 7 restrictions on this, I expect it to come into play enough to matter. Restriction 6 just means you are aggregating on a `keyword` field. Or an `ip`. And its fairly common for `keyword`s to have less than a thousand distinct values. Certainly not everywhere, but some places. I expect "cold tier" indices are very very likely not to have deleted documents at all. And the optimization works segment by segment - so it'll save some time on each segment without deleted documents. But more time if the entire index doesn't have any. The optimization builds on elastic#68871 which translates `terms` aggregations against low cardinality fields with global ordinals into a `filters` aggregation. This teaches the `filters` aggregation to recognize when it can get its results from the index metadata. Rather, it creates the infrastructure to make that fairly simple and applies it in the case of the queries generated by the terms aggregation.
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
nik9000
left a comment
There was a problem hiding this comment.
Check out the profile output:
$ curl -s -HContent-Type:application/json -uelastic:password localhost:9200/_search?pretty -d'{
"size": 0,
"profile": true,
"aggs": {
"t": {
"terms": {
"field": "rate_code_id"
}
}
}
}'
{
"took" : 17,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"t" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1",
"doc_count" : 160957580
},
{
"key" : "2",
"doc_count" : 3141764
},
{
"key" : "5",
"doc_count" : 904379
},
{
"key" : "3",
"doc_count" : 269558
},
{
"key" : "4",
"doc_count" : 65655
},
{
"key" : "99",
"doc_count" : 5661
},
{
"key" : "6",
"doc_count" : 2095
}
]
}
},
"profile" : {
"shards" : [
{
"id" : "[tG5FPmcfQ6aI9KXq9V9cTg][nyc_taxis][0]",
"searches" : [
{
"query" : [
{
"type" : "MatchAllDocsQuery",
"description" : "*:*",
"time_in_nanos" : 4265,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 0,
"create_weight" : 4265,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 0
}
}
],
"rewrite_time" : 14563,
"collector" : [
{
"name" : "MultiCollector",
"reason" : "search_multi",
"time_in_nanos" : 9553931,
"children" : [
{
"name" : "EarlyTerminatingCollector",
"reason" : "search_count",
"time_in_nanos" : 170074
},
{
"name" : "ProfilingAggregator: [org.elasticsearch.search.aggregations.bucket.terms.StringTermsAggregatorFromFilters@3bdd7fd1]",
"reason" : "aggregation",
"time_in_nanos" : 9155907
}
]
}
]
}
],
"aggregations" : [
{
"type" : "StringTermsAggregatorFromFilters",
"description" : "t",
"time_in_nanos" : 9181873,
"breakdown" : {
"reduce" : 0,
"post_collection_count" : 1,
"build_leaf_collector" : 9060609,
"build_aggregation" : 114667,
"build_aggregation_count" : 1,
"build_leaf_collector_count" : 29,
"post_collection" : 2672,
"initialize" : 3925,
"initialize_count" : 1,
"reduce_count" : 0,
"collect" : 0,
"collect_count" : 0
},
"debug" : {
"delegate" : "FiltersAggregator.FilterByFilter",
"delegate_debug" : {
"segments_with_doc_count_field" : 0,
"segments_with_deleted_docs" : 0,
"filters" : [
{
"results_from_metadata" : 29,
"query" : "rate_code_id:1",
"scorers_prepared_while_estimating_cost" : 0,
"type" : "term"
},
{
"results_from_metadata" : 29,
"query" : "rate_code_id:2",
"scorers_prepared_while_estimating_cost" : 0,
"type" : "term"
},
{
"results_from_metadata" : 29,
"query" : "rate_code_id:3",
"scorers_prepared_while_estimating_cost" : 0,
"type" : "term"
},
{
"results_from_metadata" : 29,
"query" : "rate_code_id:4",
"scorers_prepared_while_estimating_cost" : 0,
"type" : "term"
},
{
"results_from_metadata" : 29,
"query" : "rate_code_id:5",
"scorers_prepared_while_estimating_cost" : 0,
"type" : "term"
},
{
"results_from_metadata" : 29,
"query" : "rate_code_id:6",
"scorers_prepared_while_estimating_cost" : 0,
"type" : "term"
},
{
"results_from_metadata" : 29,
"query" : "rate_code_id:99",
"scorers_prepared_while_estimating_cost" : 0,
"type" : "term"
}
]
}
}
}
]
}
]
}
}
| * Estimate the number of documents that this aggregation must visit. We'll | ||
| * stop counting once we've passed {@code maxEstimatedCost} if we aren't profiling. | ||
| */ | ||
| @SuppressWarnings("resource") // We're not in change of anything Closeable |
There was a problem hiding this comment.
While I was porting this to use the new QueryToFilterAdatapter I noticed a problem: previously this would abort estimating the cost if the cost went over the max which is probably good. And when you profile it wouldn't abort the estimating so that you could get back what the "real total" estimate would be. Also probably good. But we timed how long the estimate took. And previously the timer included how long it took to estimate without the early abort. That felt wrong. Now we stop the timer as soon as we pass the limit. Or when we return.
| /** | ||
| * Special case when the filter can't match anything. | ||
| */ | ||
| private static class MatchNoneQueryToFilterAdapter extends QueryToFilterAdapter { |
There was a problem hiding this comment.
We don't strictly need this for the PR but it feels like a nice example of how you'd implement this for a filter that can come purely from metadata.
| /** | ||
| * Filter that matches every document. | ||
| */ | ||
| private static class MatchAllQueryToFilterAdapter extends QueryToFilterAdapter { |
There was a problem hiding this comment.
And you don't strictly need this one either because folks won't usually write match_all in their filters agg. But, again, its a fairly compact way of looking at how you'd read from metadata.
| long count(LeafReaderContext ctx, FiltersAggregator.Counter counter, Bits live) throws IOException { | ||
| if (countCanUseMetadata(counter, live)) { | ||
| resultsFromMetadata++; | ||
| return ctx.reader().docFreq(query().getTerm()); |
There was a problem hiding this comment.
This right here is the magic.
We certainly could have done with without adding this whole abstraction. But the abstraction really helps me reason about what we want from queries and having a subclass per query that has a fancy optimization helps me to identify which fancy optimizations we have. And it helps us track how they are being applied when you profile it.
| */ | ||
| public void testMatchAllOnFilteredIndex() throws IOException { | ||
| AggregationBuilder builder = new FiltersAggregationBuilder("test", new KeyedFilter("q1", new MatchAllQueryBuilder())); | ||
| try (Directory directory = newDirectory()) { |
There was a problem hiding this comment.
This one is kind of a mess because it doesn't fit into how the agg tests usually work. But this is a fairly rare thing.
|
That failure looks real to me! |
All better now. |
not-napoleon
left a comment
There was a problem hiding this comment.
I like the idea behind QueryToFilterAdapter as an abstraction in general, but I think it's too abstract right now. I don't think adding the second layer CommonQueryToFilterAdapter just to avoid having an unused query parameter on the match all & match none cases is worth it, and I'd rather see one or zero abstract classes. Willing to discuss more.
| } | ||
|
|
||
| List<QueryToFilterAdapter> filters() { | ||
| return filters; |
There was a problem hiding this comment.
Is this mutable on purpose? Does it have to be?
There was a problem hiding this comment.
Nevermind, I see now that we use List.copyOf in the ctor to make this immutable before we get to this point.
| /** | ||
| * Adapts a Lucene {@link Query} to the behaviors used be the | ||
| * {@link FiltersAggregator}. In general we try to delegate to {@linkplain Query} | ||
| * when we don't have |
There was a problem hiding this comment.
I think you forgot to finish this comment, and I'm not sure what you meant to say
| * {@link FiltersAggregator}. In general we try to delegate to {@linkplain Query} | ||
| * when we don't have | ||
| */ | ||
| public abstract class QueryToFilterAdapter { |
There was a problem hiding this comment.
I thought we'd talked about making the base class concrete and using it for the default implementation. What changed your mind?
There was a problem hiding this comment.
Two things bothered me about it:
- The base class would need a
<>operator which is kind of a pain because it's really just an implementation detail. - MatchNone doesn't really want any of the "default" stuff. It'd fully override it. MatchAll is pretty similar. But those are the only two like that.
I'd certainly be willing to try and do all the merging and see what it looks like.
|
|
||
| /** | ||
| * Is it safe to use index metadata like | ||
| * {@link IndexReader#docFreq} or {@link IndexReader#maxDoc} to count. the |
There was a problem hiding this comment.
| * {@link IndexReader#docFreq} or {@link IndexReader#maxDoc} to count. the | |
| * {@link IndexReader#docFreq} or {@link IndexReader#maxDoc} to count the |
| /** | ||
| * Is it safe to use index metadata like | ||
| * {@link IndexReader#docFreq} or {@link IndexReader#maxDoc} to count. the | ||
| * number of matching documents. |
There was a problem hiding this comment.
I think this would benefit from explaining what "safe" means in this context. In particular, I don't think it's obvious what Bits live is or why it should be null here.
| */ | ||
| abstract QueryToFilterAdapter union(Query extraQuery) throws IOException; | ||
|
|
||
| abstract IntPredicate matchingDocIds(LeafReaderContext ctx) throws IOException; |
There was a problem hiding this comment.
This should have some javadoc to indicate how to implement it, as should count
| /** | ||
| * Abstract superclass of filters that delegates everything to the query. | ||
| */ | ||
| private abstract static class CommonQueryToFilterAdapter<Q extends Query> extends QueryToFilterAdapter { |
There was a problem hiding this comment.
Two layers of abstraction seems excessive here. Is there ever a case where we'd want something to accept a CommonQueryToFilterAdapter but not a QueryToFilterAdapter? It looks to me like this just exists for code reuse, in which case I'd rather it wasn't abstract, just make this the default implementation.
There was a problem hiding this comment.
Yeah - its just for code reuse. I'll see what making it the default implementation looks like.
There was a problem hiding this comment.
Thanks! If it looks worse, we can go with this, but I find the two abstract base classes confusing, personally.
| * Note: This method rewrites the query against the {@link IndexSearcher}. | ||
| */ | ||
| QueryToFilterAdapter<?> union(Query extraQuery) throws IOException { | ||
| extraQuery = searcher().rewrite(extraQuery); |
There was a problem hiding this comment.
This method seems like a limited duplication of logic of boolean query rewrite. What's the main reason for not just wrapping this into boolean with 2 must closes and allow it to figure out optimization?
There was a problem hiding this comment.
BooleanQuery doesn't know how to merge point range queries and that speeds up when we rewrite date_histogram into filters by a factor of about 2. Its something @iverase and I've talked about some. And I've talked about it with @romseygeek. I think at some point Lucene'll be able to do more of this sort of thing, but until then, here we are.
There was a problem hiding this comment.
Makes sense. I think a TODO here with a link to the issue would help future readers.
| */ | ||
| public static QueryToFilterAdapter<?> build(IndexSearcher searcher, String key, Query query) throws IOException { | ||
| query = searcher.rewrite(query); | ||
| if (query instanceof TermQuery) { |
There was a problem hiding this comment.
IDK, I cannot quite put my finger on it, but that smells like we are compensating for some functionality or optimization that is missing in lucene. I would love somebody closer to lucene to take a look at it.
|
We are indeed compensating. Maybe prototyping? I think it'd be super
reasonable for lucene to have some of these methods in some form. It has a
kind of limited count optimization already. I was hoping to build what we
need and cherry pick it into lucene when they want it.
I'll ping some lucene folk to review too.
…On Tue, Feb 23, 2021, 15:56 Igor Motov ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/QueryToFilterAdapter.java
<#69377 (comment)>
:
> + }
+ /*
+ * We can only use metadata if we're not using the special docCount
+ * field. Otherwise we wouldn't know how many documents each lucene
+ * document represents.
+ */
+ return counter.docCount.alwaysOne();
+ }
+
+ /**
+ * Make a filter that matches this filter and the provided query.
+ * <p>
+ * Note: This method rewrites the query against the ***@***.*** IndexSearcher}.
+ */
+ QueryToFilterAdapter<?> union(Query extraQuery) throws IOException {
+ extraQuery = searcher().rewrite(extraQuery);
This method seems like a limited duplication of logic of boolean query
rewrite. What's the main reason for not just wrapping this into boolean
with 2 must closes and allow it to figure out optimization?
------------------------------
In
server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/QueryToFilterAdapter.java
<#69377 (comment)>
:
> +import java.util.function.IntPredicate;
+
+/**
+ * Adapts a Lucene ***@***.*** Query} to the behaviors used be the
+ * ***@***.*** FiltersAggregator}. In general we try to delegate to ***@***.*** Query}
+ * when we don't have a special optimization.
+ */
+public class QueryToFilterAdapter<Q extends Query> {
+ /**
+ * Build a filter for the query against the provided searcher.
+ * <p>
+ * Note: This method rewrites the query against the ***@***.*** IndexSearcher}
+ */
+ public static QueryToFilterAdapter<?> build(IndexSearcher searcher, String key, Query query) throws IOException {
+ query = searcher.rewrite(query);
+ if (query instanceof TermQuery) {
IDK, I cannot quite put my finger on it, but that smells like we are
compensating for some functionality or optimization that is missing in
lucene. I would love somebody closer to lucene to take a look at it.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#69377 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABUXIT5G37KMWAEF2CRMVDTAQJALANCNFSM4YBAZ5UA>
.
|
| * Build a predicate that the "compatible" implementation of the | ||
| * {@link FiltersAggregator} will use to figure out if the filter matches. | ||
| * <p> | ||
| * Consumers of this method will always call it with non-negative, |
There was a problem hiding this comment.
| * Consumers of this method will always call it with non-negative, | |
| * Consumers of the returned predicate will always call it with non-negative, |
I found the pronoun ambiguous in this context.
not-napoleon
left a comment
There was a problem hiding this comment.
I like the less abstract QueryToFilterAdapter considerably more, thank you for changing it. +1 to merge.
I talked privately with a Lucene folk who thought my strategy of "make it here and hope we can cherry pick some of it to Lucene" made sense. He didn't believe we'd be able to cherry pick all of it. He didn't believe cherry picking any of it would be a fast process either. |
|
There is a whole class of optimizations that we can make in ES because we have |
This speeds up the `terms` agg in a very specific case: 1. It has no child aggregations 2. It has no parent aggregations 3. There are no deleted documents 4. You are not using document level security 5. There is no top level query 6. The field has global ordinals 7. There are less than one thousand distinct terms That is a lot of restirctions! But the speed up pretty substantial because in those cases we can serve the entire aggregation using metadata that lucene precomputes while it builds the index. In a real rally track we have we get a 92% speed improvement, but the index isn't *that* big: ``` | 90th percentile service time | keyword-terms-low-cardinality | 446.031 | 36.7677 | -409.263 | ms | ``` In a rally track with a larger index I ran some tests by hand and the aggregation went from 2200ms to 8ms. Even though there are 7 restrictions on this, I expect it to come into play enough to matter. Restriction 6 just means you are aggregating on a `keyword` field. Or an `ip`. And its fairly common for `keyword`s to have less than a thousand distinct values. Certainly not everywhere, but some places. I expect "cold tier" indices are very very likely not to have deleted documents at all. And the optimization works segment by segment - so it'll save some time on each segment without deleted documents. But more time if the entire index doesn't have any. The optimization builds on elastic#68871 which translates `terms` aggregations against low cardinality fields with global ordinals into a `filters` aggregation. This teaches the `filters` aggregation to recognize when it can get its results from the index metadata. Rather, it creates the infrastructure to make that fairly simple and applies it in the case of the queries generated by the terms aggregation.
This speeds up the `terms` agg in a very specific case: 1. It has no child aggregations 2. It has no parent aggregations 3. There are no deleted documents 4. You are not using document level security 5. There is no top level query 6. The field has global ordinals 7. There are less than one thousand distinct terms That is a lot of restirctions! But the speed up pretty substantial because in those cases we can serve the entire aggregation using metadata that lucene precomputes while it builds the index. In a real rally track we have we get a 92% speed improvement, but the index isn't *that* big: ``` | 90th percentile service time | keyword-terms-low-cardinality | 446.031 | 36.7677 | -409.263 | ms | ``` In a rally track with a larger index I ran some tests by hand and the aggregation went from 2200ms to 8ms. Even though there are 7 restrictions on this, I expect it to come into play enough to matter. Restriction 6 just means you are aggregating on a `keyword` field. Or an `ip`. And its fairly common for `keyword`s to have less than a thousand distinct values. Certainly not everywhere, but some places. I expect "cold tier" indices are very very likely not to have deleted documents at all. And the optimization works segment by segment - so it'll save some time on each segment without deleted documents. But more time if the entire index doesn't have any. The optimization builds on #68871 which translates `terms` aggregations against low cardinality fields with global ordinals into a `filters` aggregation. This teaches the `filters` aggregation to recognize when it can get its results from the index metadata. Rather, it creates the infrastructure to make that fairly simple and applies it in the case of the queries generated by the terms aggregation.
Now that we've backported elastic#69377 to 7.x we can run backwards compatibility tests against it.
Now that we've backported #69377 to 7.x we can run backwards compatibility tests against it.
This optimizes the `date_histogram` agg when there is a single bucket and no sub-aggregations. We expect this to happen from time to time when the buckets are larger than a day because folks often use "daily" indices. This was already fairly fast, but using the metadata makes it 10x faster. Something like 98ms becomes 7.5ms. Nice if you can get it! Like elastic#69377 this optimization will disable itself if you have document level security enabled or are querying a rollup index. Also like elastic#69377 it won't do anything if there is a top level query.
This optimizes the `date_histogram` agg when there is a single bucket and no sub-aggregations. We expect this to happen from time to time when the buckets are larger than a day because folks often use "daily" indices. This was already fairly fast, but using the metadata makes it 10x faster. Something like 98ms becomes 7.5ms. Nice if you can get it! Like #69377 this optimization will disable itself if you have document level security enabled or are querying a rollup index. Also like #69377 it won't do anything if there is a top level query.
This optimizes the `date_histogram` agg when there is a single bucket and no sub-aggregations. We expect this to happen from time to time when the buckets are larger than a day because folks often use "daily" indices. This was already fairly fast, but using the metadata makes it 10x faster. Something like 98ms becomes 7.5ms. Nice if you can get it! Like elastic#69377 this optimization will disable itself if you have document level security enabled or are querying a rollup index. Also like elastic#69377 it won't do anything if there is a top level query.
…2989) This optimizes the `date_histogram` agg when there is a single bucket and no sub-aggregations. We expect this to happen from time to time when the buckets are larger than a day because folks often use "daily" indices. This was already fairly fast, but using the metadata makes it 10x faster. Something like 98ms becomes 7.5ms. Nice if you can get it! Like #69377 this optimization will disable itself if you have document level security enabled or are querying a rollup index. Also like #69377 it won't do anything if there is a top level query.
This speeds up the
termsagg in a very specific case:That is a lot of restirctions! But the speed up pretty substantial because
in those cases we can serve the entire aggregation using metadata that
lucene precomputes while it builds the index. In a real rally track we
have we get a 92% speed improvement, but the index isn't that big:
In a rally track with a larger index I ran some tests by hand and the
aggregation went from 2200ms to 8ms.
Even though there are 7 restrictions on this, I expect it to come into
play enough to matter. Restriction 6 just means you are aggregating on
a
keywordfield. Or anip. And its fairly common forkeywords tohave less than a thousand distinct values. Certainly not everywhere, but
some places.
I expect "cold tier" indices are very very likely not to have deleted
documents at all. And the optimization works segment by segment - so
it'll save some time on each segment without deleted documents. But more
time if the entire index doesn't have any.
The optimization builds on #68871 which translates
termsaggregationsagainst low cardinality fields with global ordinals into a
filtersaggregation. This teaches the
filtersaggregation to recognize whenit can get its results from the index metadata. Rather, it creates the
infrastructure to make that fairly simple and applies it in the case of
the queries generated by the terms aggregation.