the transport/http worker thread will not be released for a long time#57284
the transport/http worker thread will not be released for a long time#57284hanbj wants to merge 1 commit intoelastic:masterfrom hanbj:same
Conversation
|
💚 CLA has been signed |
|
@hanbj Thank you for the PR. Can you please sign the Contributor Agreement? |
|
Pinging @elastic/es-distributed (:Distributed/Network) |
|
Can you provide the context behind this PR? My understanding here is that you are concerned that the XContent response is serialized on the transport thread for _cat/shards or _cat/indices or _all/_mapping calls. How large of responses are we discussing and what type of latency does that introduce? |
|
@tbrooks8 thank you, the background is as follows: I added time to print before and after the messageReceived method of the InboundHandler class. In the log, I found that the messageReceived method sometimes takes a few seconds or even tens of seconds. I observed that when the cluster metadata is very large, such as indices and shards, there is almost no response when I call the _cat/shards interface. curl -XGET http://127.0.0.1:9200/_mappings > mappings At this time, the messageReceived method will also take a lot of time. So I read the relevant code logic and found that some interfaces are executed in the same thread pool. Therefore, the transport_worker / http_server_worker thread may not be released for a long time and cannot process new requests. The overall throughput of the system will be affected. |
|
@hanbj the problem is that your first commit used a different email ( One easy option is to resign with the email you committed with. |
|
@tbrooks8 @dliappis Do you have any other ideas about PR, I see _cat/indices interface is submitted to the MANAGEMENT thread pool for execution, or we can add a thread pool in the ThreadPool class to handle large response. |
|
@hanbj how did you measure those 6s time and interpret it as the time to serialize (mainly wondering if your measurement includes non-blocking IO time as well)? Even though there's some expensive operations going on when serializing the mapping response, I find it somewhat unlikely that it will take 6s, even in the case of almost 100MB in mappings. I think if we decide to fix this it might be fine to simply move the serialization to the generic pool and be done with it (since this shouldn't be called at a high frequency I don't think there's much point in optimizing the logic itself) but I'm having a hard time reproducing a multi-second time to serialize. |
|
@original-brownbear By using trace command, we can actively search the method call path corresponding to class pattern / method pattern, render and count all the performance overhead and trace the call link on the whole call link. [arthas@1669]$ trace org.elasticsearch.transport.TcpTransport handleResponse '#cost > 5000' The handleresponse() method took 6939ms. |
|
I hope you don't mind that I'll close this one for now since the change in this PR isn't something we want to go with as explained above. Thanks so much for bringing this to our attention @hanbj |
Use thread-local buffers and deflater and inflater instances to speed up compressing and decompressing from in-memory bytes. Not manually invoking `end()` on these should be safe since their off-heap memory will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals that are not instantiated at a high frequency. This significantly reduces the amount of byte copying and object creation relative to the previous approach which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc. Relates #57284 which should be helped by this change to some degree. Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these code paths.
Use thread-local buffers and deflater and inflater instances to speed up compressing and decompressing from in-memory bytes. Not manually invoking `end()` on these should be safe since their off-heap memory will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals that are not instantiated at a high frequency. This significantly reduces the amount of byte copying and object creation relative to the previous approach which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc. Relates #57284 which should be helped by this change to some degree. Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these code paths.
When the cluster is very large, when you call interfaces such as _cat/shards or _cat/indices or _all/_mapping, there is no response for a long time, and sometimes even gateway timeout, which is caused by the response is too large.
For example, the rest request method requests _all/_mapping. ES finally calls the toXContent() method to construct a json return. This is a very time-consuming operation and is executed in the http_server_worker thread.
A special thread pool is the same. When submitting tasks to the same thread pool, the worker thread is still the transport_worker / http_server_worker thread. If the task submitted to the same thread pool is a large task, it will cause the worker thread to not be released for a long time. Reduce overall system throughput.