Move RestClusterStateAction Response Serialization to Management Pool by original-brownbear · Pull Request #65843 · elastic/elasticsearch

original-brownbear · 2020-12-03T18:16:57Z

Moving the cluster state response serialization to the management thread just like we did for the mappings response in #57937 since it's a potentially very large and slow to serialize response.

Responding with the full cluster state implies serializing the full cluster state on the IO thread. In case of very large cluster states this serialization is not a trivial action and can take multiple seconds so it shouldn't be happening on a transport thread.

elasticmachine · 2020-12-03T18:17:00Z

Pinging @elastic/es-distributed (Team:Distributed)

…agement-thread

DaveCTurner

I'm fairly convinced we should move the work somewhere that's not a transport thread, but not convinced that the MANAGEMENT threadpool is the right place, for similar reasons to those we discussed in #51992.

I also worry that this might have an impact on CCR which uses (well-behaved) cluster state requests for updating the metadata on the follower. This change introduces an interaction between CCR and potentially-slow monitoring activity.

I left a couple of other questions too.

.../src/main/java/org/elasticsearch/action/admin/cluster/state/TransportClusterStateAction.java

DaveCTurner · 2020-12-04T09:16:53Z

server/src/main/java/org/elasticsearch/rest/action/admin/cluster/RestClusterStateAction.java

+                                        if (clusterStateRequest.local() == false &&
+                                                threadPool.relativeTimeInMillis() - startTimeMs >
+                                                        clusterStateRequest.masterNodeTimeout().millis()) {
+                                            throw new ElasticsearchTimeoutException("Timed out getting cluster state");


DaveCTurner · 2020-12-04T09:31:53Z

Linking #51992 in this comment too since apparently a link in a review comment doesn't create a reverse link on the target issue.

original-brownbear · 2020-12-04T09:35:16Z

I'm fairly convinced we should move the work somewhere that's not a transport thread, but not convinced that the MANAGEMENT threadpool is the right place, for similar reasons to those we discussed in #51992.

I agree with those reasons. Then again I don't see this being much different from #57937 ...

I have no scientific way of determining whether or not it yet again is just the REST handler that is slow or the transport action as a whole. But ... I can't see how a O(100MB) cluster state would serialize as quickly as we would want it to for a transport layer action so I went with the change for the transport layer as well.
We could (that would be less controversial since it won't affect CCR) just make this change for the REST layer like we did for mappings and check out the cloud logs in 7.11? :)

DaveCTurner · 2020-12-04T10:01:59Z

I agree with those reasons. Then again I don't see this being much different from #57937 ...

Indeed.

Could we give CCR its own action for getting the index metadata of a single index? It currently does some follower-side retries that could reasonably move onto the leader too. That way we're not mixing up random monitoring/diagnostics/abuse with well-behaved internal stuff and I'd be more comfortable with pushing this onto a very restricted threadpool.

original-brownbear · 2020-12-04T10:34:37Z

Could we give CCR its own action for getting the index metadata of a single index? It currently does some follower-side retries that could reasonably move onto the leader too. That way we're not mixing up random monitoring/diagnostics/abuse with well-behaved internal stuff and I'd be more comfortable with pushing this onto a very restricted threadpool.

++ sounds good. Maybe do that in a separate PR (since it's one of these nasty BwC things and that's going to take a little more time to do :)) and reduce this PR to just the REST layer change for now (which will probably in the real world like with the mappings be the real world thing causing the warning logs anyway).

DaveCTurner · 2020-12-04T10:43:33Z

Sure, sounds good to me.

…agement-thread

original-brownbear · 2020-12-04T11:06:08Z

server/src/test/java/org/elasticsearch/action/admin/cluster/state/ClusterStateApiTests.java

-            assertThat(future2.isDone(), is(true));
-        });
-        ClusterStateResponse response = future2.actionGet();
+        response = future2.get(10L, TimeUnit.SECONDS);


I kept this test cleanup even though it's not necessary now with the transport action changes reverted because busy asserting on futures was just weird ...

original-brownbear · 2020-12-04T11:44:25Z

Sure, sounds good to me.

Alright, all done, all green now :)

DaveCTurner

LGTM, I left one small optional request.

DaveCTurner · 2020-12-04T11:49:35Z

server/src/main/java/org/elasticsearch/rest/action/admin/cluster/RestClusterStateAction.java

+                                        builder.endObject();
+                                        return new BytesRestResponse(RestStatus.OK, builder);
+                                    }
+                                }.onResponse(response)));


Seems a bit weird to create the listener just to immediately complete it. Why not inline this?

Did the same for the mappings, so I didn't have to duplicate the logic in RestBuilderListener but now that we have two spots that follow this pattern I think we can refactor this in a follow-up and extract the logic for writing out the response somewhere so we don't have to do this.

DaveCTurner · 2020-12-04T11:51:38Z

Oh yes and the PR title and description aren't accurate any more either.

original-brownbear · 2020-12-04T11:56:27Z

Thanks David, fixed the description/title :)

…#65843) (#65881) Moving the cluster state response serialization to the management thread just like we did for the mappings response in #57937 since it's a potentially very large and slow to serialize response.

original-brownbear added >non-issue :Distributed/Network Http and internode communication implementations v8.0.0 v7.11.0 labels Dec 3, 2020

elasticmachine added the Team:Distributed Meta label for distributed team. label Dec 3, 2020

original-brownbear added 2 commits December 3, 2020 23:24

Merge remote-tracking branch 'elastic/master' into cs-response-on-man…

e6898eb

…agement-thread

fix test

3f0ea97

original-brownbear requested a review from DaveCTurner December 3, 2020 23:21

DaveCTurner reviewed Dec 4, 2020

View reviewed changes

original-brownbear requested a review from DaveCTurner December 4, 2020 09:35

original-brownbear added 2 commits December 4, 2020 11:55

Merge remote-tracking branch 'elastic/master' into cs-response-on-man…

a326f79

…agement-thread

revert transport action changes

02b7fb6

original-brownbear commented Dec 4, 2020

View reviewed changes

DaveCTurner approved these changes Dec 4, 2020

View reviewed changes

original-brownbear changed the title ~~Move TransportClusterStateAction to Management Pool~~ Move RestClusterStateAction Response Serialization to Management Pool Dec 4, 2020

original-brownbear merged commit 2d336d1 into elastic:master Dec 4, 2020

original-brownbear deleted the cs-response-on-management-thread branch December 4, 2020 11:56

original-brownbear mentioned this pull request Dec 4, 2020

Move RestClusterStateAction Response Serialization to Management Pool (#65843) #65881

Merged

original-brownbear restored the cs-response-on-management-thread branch December 6, 2020 19:04

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Conversation

original-brownbear commented Dec 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Dec 3, 2020

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DaveCTurner Dec 4, 2020

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Dec 4, 2020

Uh oh!

original-brownbear commented Dec 4, 2020

Uh oh!

DaveCTurner commented Dec 4, 2020

Uh oh!

original-brownbear commented Dec 4, 2020

Uh oh!

DaveCTurner commented Dec 4, 2020

Uh oh!

original-brownbear Dec 4, 2020

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Dec 4, 2020

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Dec 4, 2020

Choose a reason for hiding this comment

Uh oh!

original-brownbear Dec 4, 2020

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Dec 4, 2020

Uh oh!

original-brownbear commented Dec 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

original-brownbear commented Dec 3, 2020 •

edited

Loading