Add setting to disable aggs optimization#73620
Merged
nik9000 merged 8 commits intoelastic:masterfrom Jun 2, 2021
Merged
Conversation
Sometimes our fancy "run this agg as a Query" optimizations end up slower than running the aggregation in the old way. We know that and use heuristics to dissable the optimization in that case. But it turns out that the process of running the heuristics itself can be slow, depending on the query. Worse, changing the heuristics requires an upgrade, which means waiting. If the heurisics make a terrible choice folks need a quick way out. This adds such a way: a cluster level setting that contains a list of queries that are considered "too expensive" to try and optimize. If the top level query contains any of those queries we'll disable the "run as Query" optimization. The default for this settings is wildcard and term-in-set queries, which is fairly conservative. There are certainly wildcard and term-in-set queries that the optimization works well with, but there are other queries of that type that it works very badly with. So we're being careful. Better, you can modify this setting in a running cluster to disable the optimization if we find a new type of query that doesn't work well. Closes elastic#73426
Collaborator
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
Member
Author
|
Thanks @iverase ! |
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this pull request
Jun 2, 2021
Sometimes our fancy "run this agg as a Query" optimizations end up slower than running the aggregation in the old way. We know that and use heuristics to dissable the optimization in that case. But it turns out that the process of running the heuristics itself can be slow, depending on the query. Worse, changing the heuristics requires an upgrade, which means waiting. If the heurisics make a terrible choice folks need a quick way out. This adds such a way: a cluster level setting that contains a list of queries that are considered "too expensive" to try and optimize. If the top level query contains any of those queries we'll disable the "run as Query" optimization. The default for this settings is wildcard and term-in-set queries, which is fairly conservative. There are certainly wildcard and term-in-set queries that the optimization works well with, but there are other queries of that type that it works very badly with. So we're being careful. Better, you can modify this setting in a running cluster to disable the optimization if we find a new type of query that doesn't work well. Closes elastic#73426
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this pull request
Jun 2, 2021
Sometimes our fancy "run this agg as a Query" optimizations end up slower than running the aggregation in the old way. We know that and use heuristics to dissable the optimization in that case. But it turns out that the process of running the heuristics itself can be slow, depending on the query. Worse, changing the heuristics requires an upgrade, which means waiting. If the heurisics make a terrible choice folks need a quick way out. This adds such a way: a cluster level setting that contains a list of queries that are considered "too expensive" to try and optimize. If the top level query contains any of those queries we'll disable the "run as Query" optimization. The default for this settings is wildcard and term-in-set queries, which is fairly conservative. There are certainly wildcard and term-in-set queries that the optimization works well with, but there are other queries of that type that it works very badly with. So we're being careful. Better, you can modify this setting in a running cluster to disable the optimization if we find a new type of query that doesn't work well. Closes elastic#73426
nik9000
added a commit
that referenced
this pull request
Jun 2, 2021
Sometimes our fancy "run this agg as a Query" optimizations end up slower than running the aggregation in the old way. We know that and use heuristics to dissable the optimization in that case. But it turns out that the process of running the heuristics itself can be slow, depending on the query. Worse, changing the heuristics requires an upgrade, which means waiting. If the heurisics make a terrible choice folks need a quick way out. This adds such a way: a cluster level setting that contains a list of queries that are considered "too expensive" to try and optimize. If the top level query contains any of those queries we'll disable the "run as Query" optimization. The default for this settings is wildcard and term-in-set queries, which is fairly conservative. There are certainly wildcard and term-in-set queries that the optimization works well with, but there are other queries of that type that it works very badly with. So we're being careful. Better, you can modify this setting in a running cluster to disable the optimization if we find a new type of query that doesn't work well. Closes #73426
nik9000
added a commit
that referenced
this pull request
Jun 2, 2021
* Add setting to disable aggs optimization (backport of #73620) Sometimes our fancy "run this agg as a Query" optimizations end up slower than running the aggregation in the old way. We know that and use heuristics to dissable the optimization in that case. But it turns out that the process of running the heuristics itself can be slow, depending on the query. Worse, changing the heuristics requires an upgrade, which means waiting. If the heurisics make a terrible choice folks need a quick way out. This adds such a way: a cluster level setting that contains a list of queries that are considered "too expensive" to try and optimize. If the top level query contains any of those queries we'll disable the "run as Query" optimization. The default for this settings is wildcard and term-in-set queries, which is fairly conservative. There are certainly wildcard and term-in-set queries that the optimization works well with, but there are other queries of that type that it works very badly with. So we're being careful. Better, you can modify this setting in a running cluster to disable the optimization if we find a new type of query that doesn't work well. Closes #73426
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this pull request
Jun 2, 2021
Now that elastic#73620 has backported to 7.13.2 we can use run its bwc test against 7.13.2.
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this pull request
Jun 2, 2021
Now that we've backported elastic#73620 we can run the bwc tests against it.
nik9000
added a commit
that referenced
this pull request
Jun 3, 2021
Now that #73620 has backported to 7.13.2 we can use run its bwc test against 7.13.2.
nik9000
added a commit
that referenced
this pull request
Jun 3, 2021
Now that we've backported #73620 we can run the bwc tests against it.
|
Can confirm
On our production system the queries took since |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sometimes our fancy "run this agg as a Query" optimizations end up
slower than running the aggregation in the old way. We know that and use
heuristics to dissable the optimization in that case. But it turns out
that the process of running the heuristics itself can be slow, depending
on the query. Worse, changing the heuristics requires an upgrade, which
means waiting. If the heurisics make a terrible choice folks need a
quick way out. This adds such a way: a cluster level setting that
will simply disable the optimization. The default is to leave the optimization
"on".
Closes #73426