Make TransportNodesAction finishHim Execute on Configured Executor#62753
Make TransportNodesAction finishHim Execute on Configured Executor#62753original-brownbear merged 5 commits intoelastic:masterfrom original-brownbear:fix-mapping-monitoring-serialization
Conversation
Currently, `finishHim` can either execute on the specified executor (in the less likely case that the local node request is the last to arrive) or on a transport thread. In case of e.g. `org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction` this leads to an expensive execution that deserializes all mapping metadata in the cluster running on the transport thread and destabilizing the cluster. In case of this transport action it was specifically moved to the `MANAGEMENT` thread to avoid the high cost of processing the stats requests on the nodes during fan-out but that did not cover the final execution on the node that received the initial request.
|
Pinging @elastic/es-distributed (:Distributed/Network) |
|
As discussed during distrib sync: I updated the PR to only fork off to another thread for the specific transport action that was causing trouble here, should be good for review now :) |
henningandersen
left a comment
There was a problem hiding this comment.
LGTM.
Can we also add an assert to TransportClusterStatsAction.newResponse that it does not run on a transport thread (or always runs on a management thread)?
| return; | ||
| } | ||
| listener.onResponse(finalResponse); | ||
| threadPool.executor(finalExecutor).execute(ActionRunnable.supply(listener, () -> newResponse(request, responses))); |
There was a problem hiding this comment.
nit: this reintroduces double firing the listener if listener.onResponse fails (will also invoke onFailure now). I am OK with it, since it is a separate effort to come up with a good solution for the pattern you use here - unless you can see an easy way around it.
There was a problem hiding this comment.
I knew you were gonna point that out (really :)). The problem here is, that we really need this safe-guard here I think. Otherwise an exception in listener.onResponse (highly likely a bug in the transport code but still) would lead to the request never getting a response back leaking memory on the caller. We have the same issue when we run this on the same thread actually (the transport handler upstream brings the exception back around to the listener in that case).
-> no easy fix here for now I'm afraid. But I do think we should look into adding more assertions for this kind of thing to ActionRunnable like you did to ActionListener.map and clean these things up more for sure.
Done in 474d4cb , went with the not-transport-thread assertion since we use that in multiple other places. Also, it's a little more flexible if we want to start using the transport action in |
|
Thanks Henning! |
…62753) (#62955) Currently, `finishHim` can either execute on the specified executor (in the less likely case that the local node request is the last to arrive) or on a transport thread. In case of e.g. `org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction` this leads to an expensive execution that deserializes all mapping metadata in the cluster running on the transport thread and destabilizing the cluster. In case of this transport action it was specifically moved to the `MANAGEMENT` thread to avoid the high cost of processing the stats requests on the nodes during fan-out but that did not cover the final execution on the node that received the initial request. This PR adds to ability to optionally specify the executor for the final step of the nodes request execution and uses that to work around the issue for the slow `TransportClusterStatsAction`. Note: the specific problem that motivated this PR is essentially the same as #57937 where we moved the execution off the transport and on the management thread as a fix as well.
Follow up to #62753, which moved this expensive method off of the transport threads. Often times monitoring will hit this endpoint every few seconds while the metadata will likely change at a much slower interval. Given that in practice the computation of the stats might take up to a minute for large cluster states, caching them on a best-effort basis seems like a reasonable improvement.
…64195) Follow up to #62753, which moved this expensive method off of the transport threads. Often times monitoring will hit this endpoint every few seconds while the metadata will likely change at a much slower interval. Given that in practice the computation of the stats might take up to a minute for large cluster states, caching them on a best-effort basis seems like a reasonable improvement.
|
In our product, occasionally Why is the threadpool so busy? the reason is that the 5 active threads are used to write disk: The MANAGEMENT threadpool will be blocked, as long as one disk is busy, which is common in produce. The api is relatively important:
If we should create a new threadpool to do with those tasks related to metadata. |
|
@kkewwei I agree that this looks problematic. A few initial questions on your setup:
I am hesitant for retention lease sync to warrant a new pool, but it may be more suitable on some other thread pool or we can look into optimizing the amount of fsync'ing needed here. |
|
Currently,
finishHimcan either execute on the specified executor(in the less likely case that the local node request is the last to arrive)
or on a transport thread.
In case of e.g.
org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsActionthis leads to an expensive execution that deserializes all mapping metadata in the cluster
running on the transport thread and destabilizing the cluster. In case of this transport
action it was specifically moved to the
MANAGEMENTthread to avoid the high cost of processingthe stats requests on the nodes during fan-out but that did not cover the final execution
on the node that received the initial request. This PR adds to ability to optionally specify the executor for the final step of the
nodes request execution and uses that to work around the issue for the slow
TransportClusterStatsAction.Note: the specific problem that motivated this PR is essentially the same as #57937 where we moved the execution off the transport and on the management thread as a fix as well.