Desired nodes API by fcofdez · Pull Request #82975 · elastic/elasticsearch

fcofdez · 2022-01-24T18:34:12Z

This commit adds the Desired Nodes API, allowing orchestrators
that manage Elasticsearch clusters to let the system know about the
current/planned topology that the cluster will run on.
This allows the system to take better decisions based on the entire
cluster topology, including nodes that will be added/removed in the
near future.

This commit adds the basic endpoints to manage the desired nodes
state:

GET /_internal/desired_nodes
PUT /_internal/desired_nodes/<history_id>/
DELETE /_internal/desired_nodes

fcofdez · 2022-02-01T10:13:28Z

docs/reference/cluster/delete-desired-nodes.asciidoc

+            "processors" : 8,
+            "memory" : "58gb",
+            "storage" : "1700gb",
+            "node_version" : "8.1.0"


I think there isn't a way to fetch this value programatically and populate it, I think we should upgrade it as new versions come out. Maybe it's better to skip the doc tests?

I think it is ok to leave that for a follow-up at least, i.e.,m skip the doc tests in this PR.

henningandersen

Sorry for dumping yet another incomplete review, will get to the rest of it shortly.

henningandersen · 2022-02-01T09:45:00Z

...java/org/elasticsearch/action/admin/cluster/desirednodes/TransportGetDesiredNodesAction.java

-        listener.onResponse(new GetDesiredNodesAction.Response(DesiredNodesMetadata.latestFromClusterState(state)));
+        final DesiredNodes latestDesiredNodes = DesiredNodesMetadata.latestFromClusterState(state);
+        if (latestDesiredNodes == null) {
+            throw new ResourceNotFoundException("Desired nodes not found");


I prefer to invoke listener.onFailure directly rather than throwing exceptions even though it will need an else block below.

henningandersen · 2022-02-01T09:52:19Z

...a/org/elasticsearch/action/admin/cluster/desirednodes/TransportUpdateDesiredNodesAction.java

+                        final DesiredNodes latestDesiredNodes = DesiredNodesMetadata.latestFromClusterState(newState);
+                        boolean replacedExistingHistoryId = previousDesiredNodes != null
+                            && previousDesiredNodes.hasSameHistoryId(latestDesiredNodes) == false;
+                        listener.onResponse(new UpdateDesiredNodesResponse(true, replacedExistingHistoryId));


I think we should simply remove the acknowledged flag from the response, just like adding voting config exclusions do not have it. The client should not care about acknowledged, only whether data is committed or not (which should throw if not or in doubt).

henningandersen · 2022-02-01T09:53:17Z

...a/org/elasticsearch/action/admin/cluster/desirednodes/TransportUpdateDesiredNodesAction.java

+
+                    @Override
+                    public void clusterStateProcessed(ClusterState oldState, ClusterState newState) {
+                        final DesiredNodes previousDesiredNodes = DesiredNodesMetadata.latestFromClusterState(oldState);


Maybe add comment that we rely on the unbatched executor here?

henningandersen · 2022-02-01T09:58:58Z

...a/org/elasticsearch/action/admin/cluster/desirednodes/TransportUpdateDesiredNodesAction.java

+    ) {
+        super(
+            UpdateDesiredNodesAction.NAME,
+            transportService,


I think we should allow this even if above circuit breaker limit, this is orchestration trying to help us out:

Suggested change

transportService,

false,

transportService,

henningandersen · 2022-02-01T10:00:06Z

...a/org/elasticsearch/action/admin/cluster/desirednodes/TransportUpdateDesiredNodesAction.java

+
+            clusterService.submitStateUpdateTask(
+                "update-desired-nodes",
+                new ClusterStateUpdateTask(Priority.HIGH, request.masterNodeTimeout()) {


I would think we should use URGENT here instead, but maybe @DaveCTurner has an opinion on that?

I'd be ok with URGENT although I'd be happier about it if these things were batched, just in case the orchestrator goes haywire.

I'll update the PR batching them

henningandersen · 2022-02-01T10:01:33Z

...main/java/org/elasticsearch/action/admin/cluster/desirednodes/UpdateDesiredNodesRequest.java

+        }
+
+        if (nodes.isEmpty()) {
+            validationException = ValidateActions.addValidationError("nodes must contain at least one node", validationException);


In a follow-up we should probably also verify that there is at least one master eligible node.

henningandersen · 2022-02-01T10:12:10Z

...java/org/elasticsearch/action/admin/cluster/desirednodes/TransportDesiredNodesActionsIT.java

    }

-    public void testSomeSettingsCanBeOverridden() {
+    public void testNodeProcessorsGetValidatedWithDesiredNodeProcessors() {


I think this verifies what the method name says by not throwing when setting the desired nodes? Perhaps add a comment if so.

henningandersen

LGTM. Thanks Francisco

DaveCTurner · 2022-02-01T13:09:03Z

...a/org/elasticsearch/action/admin/cluster/desirednodes/TransportDeleteDesiredNodesAction.java

+        ClusterState state,
+        ActionListener<AcknowledgedResponse> listener
+    ) throws Exception {
+        clusterService.submitStateUpdateTask("delete-desired-nodes", new AckedClusterStateUpdateTask(Priority.HIGH, request, listener) {


I think these don't need to be acked tasks, we don't care whether the update is applied on all nodes or not since it will only be used on the master. It's enough that the state update is committed.

I'd rather we didn't merge without addressing this comment.

DaveCTurner

Thanks @fcofdez LGTM

fcofdez · 2022-02-01T16:55:08Z

Thanks all for the reviews!

Add the dry_run query parameter to support simulating of updating of desired nodes. The update request will be validated, but no cluster state updates will be performed. In order to indicate that the response was a result of a dry run, we add the dry_run run field to the JSON representation of a response. See #82975

elasticsearchmachine added the v8.1.0 label Jan 24, 2022

sethmlarson added the Team:Clients Meta label for clients team label Jan 24, 2022

fcofdez added 24 commits January 25, 2022 07:58

Add boilerplate

451d9fc

More boilerplate

45cace8

More progress

04c817e

Bind DesiredNodesService

44fb1e4

Add tests for serialization

b25e5d1

Minor adjustments

69e40cf

More progress

714bad4

Remove unused code

7453133

Add more tests and validations

e1409b6

More tests

b53acfd

Progress

9268d60

Add more testing

db247be

Move tests around

92e22c6

Progress

3d9bd6c

Fix mistake

f2a01b5

Wire everything

e0fbac6

Add get and delete methods

deb845e

rest spec, yaml tests and some other tests

3bef612

Add more yaml tests

2c07d93

More tests

58f40f5

Cleanup

d6b4c42

Remove unused import

9b4ba54

Fix tests

eeec7a6

Add docs

42097f2

fcofdez force-pushed the desired-nodes-api branch from 562370d to 42097f2 Compare January 25, 2022 06:58

Update docs

442947d

fcofdez added Team:Distributed Meta label for distributed team. :Distributed/Autoscaling Automatically adding or removing nodes in a cluster labels Jan 25, 2022

fcofdez added 7 commits January 31, 2022 20:58

Address review comments

79d9316

Merge remote-tracking branch 'origin/master' into desired-nodes-api

b320235

Docs

6f3e86a

Merge remote-tracking branch 'origin/master' into desired-nodes-api

f4526ab

Fix serialization tests

8eb3c64

Small fix

1e9c2a7

Merge remote-tracking branch 'origin/master' into desired-nodes-api

78cf51e

fcofdez requested review from DaveCTurner and henningandersen February 1, 2022 09:14

fcofdez commented Feb 1, 2022

View reviewed changes

henningandersen reviewed Feb 1, 2022

View reviewed changes

fcofdez added 6 commits February 1, 2022 12:00

Review comments

f558888

Merge remote-tracking branch 'origin/master' into desired-nodes-api

2fab417

Check that at least one master node is present

2625676

Merge remote-tracking branch 'origin/master' into desired-nodes-api

42a1a46

Fix docs...

0cf24d6

Merge remote-tracking branch 'origin/master' into desired-nodes-api

beeccc0

henningandersen approved these changes Feb 1, 2022

View reviewed changes

DaveCTurner requested changes Feb 1, 2022

View reviewed changes

fcofdez added 2 commits February 1, 2022 16:35

Batch tasks

fa05ade

Merge remote-tracking branch 'origin/master' into desired-nodes-api

a5f79c1

fcofdez requested a review from DaveCTurner February 1, 2022 15:38

Fix docs again...

a49f316

DaveCTurner approved these changes Feb 1, 2022

View reviewed changes

fcofdez merged commit 520b843 into elastic:master Feb 1, 2022

arteam mentioned this pull request Jun 17, 2022

Make desired node operator only #87777

Closed

arteam mentioned this pull request Jul 18, 2022

Support "dry run" mode for updating Desired Nodes #88305

Merged

Conversation

fcofdez commented Jan 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henningandersen Feb 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

fcofdez commented Feb 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fcofdez commented Jan 24, 2022 •

edited

Loading

henningandersen Feb 1, 2022 •

edited

Loading