Serialize Get Mappings Response on Generic ThreadPool by original-brownbear · Pull Request #57937 · elastic/elasticsearch

original-brownbear · 2020-06-10T15:34:35Z

Solves/mitigates the issue observed in #57284

For large responses to the get mappings request, the serialization
to XContent can be extremely slow (serializing mappings is expensive since
we have to decompress and deserialize the mapping source).
To not introduce instability on the IO thread handling the get mappings response
we should move the serialization to the generic pool.
The trade-off of introducing one or two new context switches for responses that are
small enough to not cause trouble on the transport thread to prevent instability
in case of a large number of mappings in the cluster seems worth it.

Marking this team discuss for now, since we had the larger discussion of where REST actions should execute before and didn't come to a conclusion here yet. While this is a special case, we have a number of other potentially slow REST actions like this one where a very large response gets serialized.

For large responses to the get mappings request, the serialization to XContent can be extremely slow (serializing mappings is expensive since we have to decompress and deserialize the mapping source). To not introduce instability on the IO thread handling the get mappings response we should move the serialization to the generic pool. The trade-off of introducing one or two new context switches for responses that are small enough to not cause trouble on the transport thread to prevent instability in case of a large number of mappings in the cluster seems worth it.

elasticmachine · 2020-06-10T15:34:49Z

Pinging @elastic/es-distributed (:Distributed/Network)

fcofdez · 2020-06-10T16:06:42Z

I wonder if it would make sense to add a new method to TransportMasterNodeAction so we can define in which executor the response would be processed and inject that in the ActionListenerResponseHandler that's passed to the transportService. wdyt @original-brownbear ?

original-brownbear · 2020-06-10T19:51:25Z

I wonder if it would make sense to add a new method to TransportMasterNodeAction

Not sure that's necessarily where we'd want to add this given how we already have infrastructure like ThreadedActionListener that could be used for this kind of thing. But yea, that's why I added >team discuss, maybe we can see enough spots to make it so we want to generalize something (for the REST layer I guess) here.

original-brownbear · 2020-08-20T12:26:19Z

We discussed this during team discuss and decided to check if slowness in this API call happens on Cloud at any non-trivial rate.

I investigated this on Cloud and we have a lot of slow get mappings responses (up to minutes in response time) so we should take action here I think.

DaveCTurner

I'd rather not use the generic pool for this, but MANAGEMENT seems like a reasonable option. See inline comment about a timeout.

DaveCTurner · 2020-08-20T12:32:49Z

server/src/main/java/org/elasticsearch/rest/action/admin/indices/RestGetMappingAction.java

+                // on an IO thread
+                threadPool.generic().execute(ActionRunnable.wrap(this, l -> new RestBuilderListener<GetMappingsResponse>(channel) {
+                    @Override
+                    public RestResponse buildResponse(final GetMappingsResponse response, final XContentBuilder builder) throws Exception {


WDYT about checking for a timeout again here before we do all the serialisation work? If we use a bounded threadpool then these things might pile up, so aborting early might be a helpful way to push back.

👍 how about 0bd68d4 ? :)

original-brownbear · 2020-08-20T13:34:25Z

Jenkins run elasticsearch-ci/packaging-sample-windows

DaveCTurner

LGTM

original-brownbear · 2020-08-21T05:11:57Z

Thanks David!

For large responses to the get mappings request, the serialization to XContent can be extremely slow (serializing mappings is expensive since we have to decompress and deserialize the mapping source). To not introduce instability on the IO thread handling the get mappings response we should move the serialization to the management pool. The trade-off of introducing one or two new context switches for responses that are small enough to not cause trouble on the transport thread to prevent instability in case of a large number of mappings in the cluster seems worth it.

…62753) Currently, `finishHim` can either execute on the specified executor (in the less likely case that the local node request is the last to arrive) or on a transport thread. In case of e.g. `org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction` this leads to an expensive execution that deserializes all mapping metadata in the cluster running on the transport thread and destabilizing the cluster. In case of this transport action it was specifically moved to the `MANAGEMENT` thread to avoid the high cost of processing the stats requests on the nodes during fan-out but that did not cover the final execution on the node that received the initial request. This PR adds to ability to optionally specify the executor for the final step of the nodes request execution and uses that to work around the issue for the slow `TransportClusterStatsAction`. Note: the specific problem that motivated this PR is essentially the same as #57937 where we moved the execution off the transport and on the management thread as a fix as well.

…62753) (#62955) Currently, `finishHim` can either execute on the specified executor (in the less likely case that the local node request is the last to arrive) or on a transport thread. In case of e.g. `org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction` this leads to an expensive execution that deserializes all mapping metadata in the cluster running on the transport thread and destabilizing the cluster. In case of this transport action it was specifically moved to the `MANAGEMENT` thread to avoid the high cost of processing the stats requests on the nodes during fan-out but that did not cover the final execution on the node that received the initial request. This PR adds to ability to optionally specify the executor for the final step of the nodes request execution and uses that to work around the issue for the slow `TransportClusterStatsAction`. Note: the specific problem that motivated this PR is essentially the same as #57937 where we moved the execution off the transport and on the management thread as a fix as well.

…#65843) Moving the cluster state response serialization to the management thread just like we did for the mappings response in #57937 since it's a potentially very large and slow to serialize response.

…#65843) (#65881) Moving the cluster state response serialization to the management thread just like we did for the mappings response in #57937 since it's a potentially very large and slow to serialize response.

original-brownbear added :Distributed/Network Http and internode communication implementations team-discuss labels Jun 10, 2020

elasticmachine added the Team:Distributed Meta label for distributed team. label Jun 10, 2020

original-brownbear mentioned this pull request Jun 10, 2020

the transport/http worker thread will not be released for a long time #57284

Closed

DaveCTurner reviewed Aug 20, 2020

View reviewed changes

timeout

0bd68d4

original-brownbear requested a review from DaveCTurner August 20, 2020 13:34

DaveCTurner approved these changes Aug 20, 2020

View reviewed changes

original-brownbear added v7.10.0 v8.0.0 >non-issue and removed team-discuss labels Aug 21, 2020

original-brownbear merged commit e5eca6a into elastic:master Aug 21, 2020

original-brownbear deleted the run-get-mapping-rest-on-generic branch August 21, 2020 05:12

original-brownbear mentioned this pull request Aug 21, 2020

Serialize Get Mappings Response on Generic ThreadPool (#57937) #61401

Merged

This was referenced Sep 16, 2020

Add WARN Logging on Slow Transport Message Handling #62444

Merged

Make TransportNodesAction finishHim Execute on Configured Executor #62753

Merged

original-brownbear mentioned this pull request Sep 28, 2020

Make TransportNodesAction finishHim Execute on Configured Executor (#62753) #62955

Merged

original-brownbear mentioned this pull request Dec 3, 2020

Move RestClusterStateAction Response Serialization to Management Pool #65843

Merged

original-brownbear mentioned this pull request Dec 4, 2020

Move RestClusterStateAction Response Serialization to Management Pool (#65843) #65881

Merged

original-brownbear restored the run-get-mapping-rest-on-generic branch December 6, 2020 19:04

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialize Get Mappings Response on Generic ThreadPool#57937

Serialize Get Mappings Response on Generic ThreadPool#57937
original-brownbear merged 2 commits intoelastic:masterfrom
original-brownbear:run-get-mapping-rest-on-generic

original-brownbear commented Jun 10, 2020 •

edited

Loading

Uh oh!

elasticmachine commented Jun 10, 2020

Uh oh!

fcofdez commented Jun 10, 2020

Uh oh!

original-brownbear commented Jun 10, 2020

Uh oh!

original-brownbear commented Aug 20, 2020

Uh oh!

DaveCTurner left a comment

Uh oh!

DaveCTurner Aug 20, 2020

Uh oh!

original-brownbear Aug 20, 2020

Uh oh!

original-brownbear commented Aug 20, 2020

Uh oh!

DaveCTurner left a comment

Uh oh!

original-brownbear commented Aug 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

original-brownbear commented Jun 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Jun 10, 2020

Uh oh!

fcofdez commented Jun 10, 2020

Uh oh!

original-brownbear commented Jun 10, 2020

Uh oh!

original-brownbear commented Aug 20, 2020

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Aug 20, 2020

Choose a reason for hiding this comment

Uh oh!

original-brownbear Aug 20, 2020

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Aug 20, 2020

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Aug 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

original-brownbear commented Jun 10, 2020 •

edited

Loading