Skip to content

queue in search thread pool does not seem to limit the number of pending tasks #70792

@kkewwei

Description

@kkewwei

ES Version: 7.9.1

Description of the problem including expected versus actual behavior:
When ES receives excessive queries, it will throw exception as follow:

[2021-03-24T12:40:34,490][WARN ][r.suppressed             ] [node4] path: /index1/_search, params: {rest_total_hits_as_int=true, index=index1}
org.elasticsearch.action.search.SearchPhaseExecutionException:
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:551) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.search.FetchSearchPhase$1.onFailure(FetchSearchPhase.java:100) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.onRejection(AbstractRunnable.java:63) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.common.util.concurrent.TimedRunnable.onRejection(TimedRunnable.java:54) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onRejection(ThreadContext.java:703) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:90) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.execute(AbstractSearchAsyncAction.java:597) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.search.FetchSearchPhase.run(FetchSearchPhase.java:89) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:350) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:344) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:582) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:393) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.access$100(AbstractSearchAsyncAction.java:68) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:245) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:73) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:403) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:661) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.transport.TransportService.sendChildRequest(TransportService.java:703) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.transport.TransportService.sendChildRequest(TransportService.java:695) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.search.SearchTransportService.sendExecuteQuery(SearchTransportService.java:138) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction.executePhaseOnShard(SearchQueryThenFetchAsyncAction.java:79) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$3(AbstractSearchAsyncAction.java:231) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction$PendingExecutions.tryRun(AbstractSearchAsyncAction.java:668) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction$PendingExecutions.finishAndRunNext(AbstractSearchAsyncAction.java:662) ~[elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction$2.doRun(AbstractSearchAsyncAction.java:288) [elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:710) [elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.9.1.jar:7.9.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_112]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_112]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@634ee5cd on QueueResizingEsThreadPoolExecutor[name = zf-data-hdp-dn-rtyarn0375/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 10.8micros, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@caa72e7[Running, pool size = 19, active threads = 19, queued tasks = 82621, completed tasks = 2534561136]]
        at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:48) ~[elasticsearch-7.9.1.jar:7.9.1]
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) ~[?:1.8.0_112]
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) ~[?:1.8.0_112]
        at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:84) ~[elasticsearch-7.9.1.jar:7.9.1]
        ... 27 more

We set the queue capacity to be 1000, but the org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor tells us the the queued tasks is 82621?

Also I call the `GET _cat/thread_pool/search?v' and response is as follow:

node_name                 name   active queue  rejected
node1 search             0           0                      402044
node2 search             0           0                     14239
node3 search             0           0                      1092955
node4 search             19         86223             1735114

Active queue should be less than 1000, but the fact is that it is much bigger than 1000

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Search/SearchSearch-related issues that do not fall into other categories>bugTeam:SearchMeta label for search team

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions