Conversation
The tests affected in this PR are expecting that the tasks related to the test are the only tasks running. This assumption will not hold soon. For this reason, we refactor the tests so that the assertions are handling only the tasks that are related to the test. There is only one exception BaseMlIntegTestCase, in this case it was easier to exclude unrelated tasks than to include all the possible tasks created as a side effect of the ML jobs. The constant UNRELATED_TASKS enabled developers to add tasks that need to be excluded, for example the new task we are introducing in #86131.
server/src/internalClusterTest/java/org/elasticsearch/discovery/ClusterDisruptionIT.java
Outdated
Show resolved
Hide resolved
Co-authored-by: David Turner <david.turner@elastic.co>
|
@elasticmachine update branch |
|
@elasticmachine update branch |
|
Regarding the failing tests that were extending
After discussion with the team we discussed that tests that extend |
andreidan
left a comment
There was a problem hiding this comment.
Thanks for working on this Mary! I think this looks great
Left a few minor suggestions but this is generally great
server/src/main/java/org/elasticsearch/health/node/selection/HealthNodeSelector.java
Outdated
Show resolved
Hide resolved
| if (event.state().nodes().getLocalNode().isMasterNode() == false || HealthNodeSelector.findTask(event.state()) != null) { | ||
| clusterService.removeListener(taskStarter); | ||
| } else if (event.localNodeMaster()) { | ||
| clusterService.removeListener(taskStarter); |
There was a problem hiding this comment.
it seems we're removing the task starter on both branches of this if/else clause. Could this be simplified?
There was a problem hiding this comment.
I gave it a shot, let me know if it's clear as well as simplified.
|
|
||
| import static org.elasticsearch.health.node.selection.HealthNodeSelector.TASK_NAME; | ||
|
|
||
| class HealthNodeSelectorTaskParams implements PersistentTaskParams { |
There was a problem hiding this comment.
I think the test can override assertEqualInstances to make it match our new possible invariant? (ie. same instance?)
dakrone
left a comment
There was a problem hiding this comment.
This generally looks good to me! I left a couple of comments, but I'll defer the final review to Andrei et al.
server/src/main/java/org/elasticsearch/health/node/selection/HealthNodeTaskExecutor.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/health/node/selection/HealthNodeTaskExecutor.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/health/node/selection/HealthNodeTaskExecutor.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/health/node/selection/HealthNodeTaskParams.java
Show resolved
Hide resolved
test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java
Outdated
Show resolved
Hide resolved
|
@elasticmachine update branch |
andreidan
left a comment
There was a problem hiding this comment.
LGTM, thanks for iterating on this Mary
The health API, has the requirement to be as quick and as cheap as possible. To achieve this we want to have one node responsible to keep the health state of all the node of the cluster without querying all the nodes in the spot. We do not want to burden a dedicating master with this, so we are going to use a node that can contain data for this responsibility.
Health role
We discussed introducing a health role, but it doesn't seem necessary since the only requirement is that the node responsible for collecting the health states is that it contains data.
The selection of the health node happens via the persistent task framework. We introduce a new persistent task and whichever node is assigned to it, is the selected health node. If the selected health node is marked for shutdown via the Shutdown API we release the task so the master can assign it to a new node.
This PR is introducing the persistent task behind a feature flag.
Relates to #84811