Configure IndexSearcher.maxClauseCount() based on Node characteristics#81525
Configure IndexSearcher.maxClauseCount() based on Node characteristics#81525romseygeek merged 15 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-search (Team:Search) |
|
I'm not totally happy with the testing on this yet, and it will need some documentation as well, but it's good enough for a preliminary review I think. |
| public static void configureMaxClauses(ThreadPool threadPool) { | ||
| int searchThreadPoolSize = threadPool.info(ThreadPool.Names.SEARCH).getMax(); | ||
| long heapSize = JvmStats.jvmStats().getMem().getHeapMax().getGb(); | ||
| configureMaxClauses(searchThreadPoolSize, heapSize); |
There was a problem hiding this comment.
What happens if you start Elasticsearch with 512MB of heap, does it mean you get zero max clauses?
There was a problem hiding this comment.
If heapSize is zero then we stick with the default value from lucene, currently 1024. But maybe we should stick with the current ES default of 4096 instead?
There was a problem hiding this comment.
Have updated this so that we will always return a minimum of 4096, or larger if the heap/cpu size permits.
There was a problem hiding this comment.
Would this work if we used the heap size in GB as a fractional number instead?
| return; // If we don't know how to size things, keep the lucene default | ||
| } | ||
|
|
||
| int maxClauseCount = Math.toIntExact(heapInGb * 65_536 / threadPoolSize); |
There was a problem hiding this comment.
Let's add some explanation as to where this 65,536 number comes from?
| + "This limit can be set by changing the [" | ||
| + SearchModule.INDICES_MAX_CLAUSE_COUNT_SETTING.getKey() | ||
| + "] setting." | ||
| "Number of filters is too large, must be less than or equal to: [" + maxFilters + "] but was [" + maxFiltersPlusOne + "]." |
There was a problem hiding this comment.
Let's add guidance as to how to address this error? E.g. can we check if the user overrode the size of the search thread and suggest undoing the change? Or otherwise recommend scaling up or avdoiding running queries with such large numbers of clauses?
| Integer.MAX_VALUE, | ||
| Setting.Property.NodeScope | ||
| Setting.Property.NodeScope, | ||
| Setting.Property.DeprecatedWarning |
There was a problem hiding this comment.
When would we raise a deprecation warning for node settings, only on startup?
There was a problem hiding this comment.
Have confirmed with @pgomulka that this will only be emitted on startup, yes.
|
I've reworked things a bit: the utils class now only calculates the max clauses value, it is set directly in the node startup code, and we use the current default value (4096) as a minimum so that we won't inadvertently limit users query sizes when then upgrade. |
|
Interesting, this causes |
|
@elasticmachine run elasticsearch-ci/part-2 |
| of clauses of a single `bool` query. It now applies to the total number of | ||
| clauses of the rewritten query. To reduce chances of breaks, its | ||
| default value has been bumped from 1024 to 4096. | ||
| Elasticsearch will now dynamically set the maximum number of allowed clauses |
There was a problem hiding this comment.
Maybe we need a word about the fact that Elasticsearch now checks the number of clauses across the entire query rather than on a per-booleanquery basis? So that anyone who has used the hack that consists of splitting boolean queries knows that they can undo it.
| Elasticsearch will now dynamically set the maximum number of allowed clauses | ||
| in a query, using a heuristic based on the size of the search thread pool and | ||
| the size of the heap allocated to the JVM. This limit has a minimum value of | ||
| 4096 (the previous default) and will in most cases be larger (for example, |
There was a problem hiding this comment.
1024 was the previous value, not 4096?
| you might need to increase it further. | ||
| Queries with many clauses should be avoided whenever possible. | ||
| If you previously bumped this setting to accommodate heavy queries, | ||
| you might need to increase the amount of memory available to elasticsearch, |
There was a problem hiding this comment.
| you might need to increase the amount of memory available to elasticsearch, | |
| you might need to increase the amount of memory available to Elasticsearch, |
| public static void configureMaxClauses(ThreadPool threadPool) { | ||
| int searchThreadPoolSize = threadPool.info(ThreadPool.Names.SEARCH).getMax(); | ||
| long heapSize = JvmStats.jvmStats().getMem().getHeapMax().getGb(); | ||
| configureMaxClauses(searchThreadPoolSize, heapSize); |
There was a problem hiding this comment.
Would this work if we used the heap size in GB as a fractional number instead?
| default value has been bumped from 1024 to 4096. | ||
| Elasticsearch will now dynamically set the maximum number of allowed clauses | ||
| in a query, using a heuristic based on the size of the search thread pool and | ||
| the size of the heap allocated to the JVM. This limit has a minimum value of |
There was a problem hiding this comment.
I'm tempted to clarify that this number increases with heap size and decreases with the size of the threadpool, wdyt?
| assertEquals(4096, SearchUtils.calculateMaxClauseValue(4, 0)); | ||
|
|
||
| // Number of processors not available | ||
| assertEquals(4096, SearchUtils.calculateMaxClauseValue(-1, 1)); |
There was a problem hiding this comment.
is it actually something that can happen? While the number of cores can be unknown, we are actually using the threadpool size, which should always be defined?
There was a problem hiding this comment.
I'm fairly sure that this would always be set, but put this in as a safety net. But maybe it should be replaced with an assertion that the thread pool size > 1
I'm not sure what you mean here? |
We currently get the Java heap in GB as a long. A side-effect of this is that the formula always returns 0 as the maximum number of clauses if the heap is less than 1GB since the amount of memory allocated to the JVM is rounded down. Si I wonder if we should use a fractional representation of the heap size instead, e.g. 0.5 if the JVM is given 512MB of heap size? This way, we might also not need to resort to taking the max of the return value and 4096, it should better scale to small heaps? |
| or to reduce the size of your search thread pool so that more memory is | ||
| available to each concurrent search. | ||
|
|
||
| In previous versions of lucene you could get around this limit by nesting |
There was a problem hiding this comment.
| In previous versions of lucene you could get around this limit by nesting | |
| In previous versions of Lucene you could get around this limit by nesting |
| // Create more filters than is permitted by Lucene Bool clause settings. | ||
| MapBuilder filtersMap = new MapBuilder(); | ||
| int maxFilters = SearchModule.INDICES_MAX_CLAUSE_COUNT_SETTING.get(Settings.EMPTY); | ||
| int maxFilters = IndexSearcher.getMaxClauseCount(); |
There was a problem hiding this comment.
This number might be super high now, how slow is the test? Maybe we need to set an artificially low limit for this test to keep it reasonable?
| + SearchModule.INDICES_MAX_CLAUSE_COUNT_SETTING.getKey() | ||
| + "] setting." | ||
| + "]. " | ||
| + "You can increase this limit by scaling up your java heap or number of CPUs" |
There was a problem hiding this comment.
scaling up the number of CPUs would increase the size of the search threadpool and reduce the number of allowed clauses, so maybe just mention memory?
💚 Backport successful
|
elastic#81525) This commit deprecates the indices.query.bool.max_clause_count node setting, and instead configures the maximum clause count for lucene based on the available heap and the size of the thread pool. Closes elastic#46433
This commit deprecates the
indices.query.bool.max_clause_countnode setting,and instead configures the maximum clause count for lucene based on the available
heap and the size of the thread pool.
Closes #46433