Skip to content

[HealthAPI] Diagnosis: report typed affected resources#90653

Merged
andreidan merged 11 commits intoelastic:mainfrom
andreidan:typed_affected_resources
Oct 5, 2022
Merged

[HealthAPI] Diagnosis: report typed affected resources#90653
andreidan merged 11 commits intoelastic:mainfrom
andreidan:typed_affected_resources

Conversation

@andreidan
Copy link
Copy Markdown
Contributor

The health API reports the affected resources in case of an unhealthy deployment. Until now all indicators reported one type of resource per diagnosis (index, ILM policy, snapshot repository)

With the introduction of the disk indicator we now have an indicator that reports multiple types of resources under the same diagnosis (ie. nodes and indices).

This changes the structure of the affected_resources field to accommodate multiple types of resources:

"affected_resources": {
  "nodes": [
    {
      "id": "e1af6F5rTcmgpExkdOMzCg",
      "name": "hot"
    },
    {
      "id": "u_wBVl4ZRne4uZq_ziLsuw",
      "name": "warm"
    }
  ],
  "indices": [
    ".geoip_databases",
    "test_index"
  ]
}

Fixes #90219

The health API reports the affected resources in case of an unhealthy
deployment. Until now all indicators reported one type of resource per
diagnosis (index, ILM policy, snapshot repository)

With the introduction of the disk indicator we now have an indicator
that reports multiple types of resources under the same diagnosis (ie.
nodes and indices).

This changes the structure of the `affected_resources` field to
accommodate multiple types of resources:
```
"affected_resources": {
  "nodes": [
    {
      "id": "e1af6F5rTcmgpExkdOMzCg",
      "name": "hot"
    },
    {
      "id": "u_wBVl4ZRne4uZq_ziLsuw",
      "name": "warm"
    }
  ],
  "indices": [
    ".geoip_databases",
    "test_index"
  ]
}
```
@andreidan andreidan added >bug :Distributed/Health Issues for the health report API v8.5.1 v8.6.0 labels Oct 4, 2022
@andreidan andreidan requested a review from gmarouli October 4, 2022 17:21
@elasticsearchmachine elasticsearchmachine added the Team:Data Management (obsolete) DO NOT USE. This team no longer exists. label Oct 4, 2022
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

return false;
}
Diagnosis diagnosis = (Diagnosis) o;
return definition.equals(diagnosis.definition) && Arrays.equals(affectedResources, diagnosis.affectedResources);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Diagnosis equals/hashcode as otherwise records will be unique (unless an explicit array is passed in the constructor).
Not sure varargs are worth all this though?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed them with 279ed2a

@andreidan
Copy link
Copy Markdown
Contributor Author

@elasticmachine update branch

@andreidan
Copy link
Copy Markdown
Contributor Author

Failure seems transient

org.gradle.api.tasks.TaskExecutionException: Execution failed for task ':test:fixtures:s3-fixture:composeDown'.

@andreidan
Copy link
Copy Markdown
Contributor Author

@elasticmachine run elasticsearch-ci/part-2

Copy link
Copy Markdown
Contributor

@gmarouli gmarouli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for picking this up! Some minor comments only :)

for (DiscoveryNode node : nodes) {
builder.startObject();
builder.field(ID_FIELD, node.getId());
builder.field(NAME_FIELD, node.getName());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check here if name is null to omit it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally, great catch

}
if (masterNodes.containsKey(HealthStatus.YELLOW)) {
diagnosisList.add(createNonDataNodeDiagnosis(HealthStatus.YELLOW, masterNodes.get(HealthStatus.YELLOW), true));
List<DiscoveryNode> yellowMasterNodes = masterNodes.get(HealthStatus.YELLOW)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we should change the Set<DiscoveryNode> to a List<DiscoveryNode> and sort it in the constructor. I do not think the set is giving us anything now that I am thinking about it because we only encounter each node once in that for loop. What do you think?

Copy link
Copy Markdown
Contributor Author

@andreidan andreidan Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, this is a great idea. Coming up

List.of(
new Diagnosis(
DIAGNOSIS_WAIT_FOR_OR_FIX_DELAYED_SHARDS,
List.of(new Diagnosis.Resource(INDEX, List.of("restarting" + "-index")))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing this wasn't intentional :).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, what exactly? The formatting? I ran spotless

List.of(
new Diagnosis(
DIAGNOSIS_WAIT_FOR_OR_FIX_DELAYED_SHARDS,
List.of(new Diagnosis.Resource(INDEX, List.of("restarting" + "-index")))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here?

@andreidan
Copy link
Copy Markdown
Contributor Author

Test failure is #90668

@andreidan
Copy link
Copy Markdown
Contributor Author

@elasticmachine run elasticsearch-ci/part-1

@andreidan
Copy link
Copy Markdown
Contributor Author

Test failure is transient Could not download ml-cpp-8.5.0-SNAPSHOT-deps.zip

@andreidan
Copy link
Copy Markdown
Contributor Author

@elasticmachine update branch

@andreidan andreidan added the auto-backport Automatically create backport pull requests when merged label Oct 5, 2022
@andreidan andreidan merged commit 5d97f0e into elastic:main Oct 5, 2022
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💚 Backport successful

Status Branch Result
8.5

andreidan added a commit to andreidan/elasticsearch that referenced this pull request Oct 5, 2022
The health API reports the affected resources in case of an unhealthy
deployment. Until now all indicators reported one type of resource per
diagnosis (index, ILM policy, snapshot repository)

With the introduction of the disk indicator we now have an indicator
that reports multiple types of resources under the same diagnosis (ie.
nodes and indices).

This changes the structure of the `affected_resources` field to
accommodate multiple types of resources:
```
"affected_resources": {
  "nodes": [
    {
      "id": "e1af6F5rTcmgpExkdOMzCg",
      "name": "hot"
    },
    {
      "id": "u_wBVl4ZRne4uZq_ziLsuw",
      "name": "warm"
    }
  ],
  "indices": [
    ".geoip_databases",
    "test_index"
  ]
}
```
andreidan added a commit that referenced this pull request Oct 5, 2022
…#90678)

The health API reports the affected resources in case of an unhealthy
deployment. Until now all indicators reported one type of resource per
diagnosis (index, ILM policy, snapshot repository)

With the introduction of the disk indicator we now have an indicator
that reports multiple types of resources under the same diagnosis (ie.
nodes and indices).

This changes the structure of the `affected_resources` field to
accommodate multiple types of resources:
```
"affected_resources": {
  "nodes": [
    {
      "id": "e1af6F5rTcmgpExkdOMzCg",
      "name": "hot"
    },
    {
      "id": "u_wBVl4ZRne4uZq_ziLsuw",
      "name": "warm"
    }
  ],
  "indices": [
    ".geoip_databases",
    "test_index"
  ]
}
```
andreidan added a commit to andreidan/elasticsearch that referenced this pull request Oct 6, 2022
@csoulios csoulios added v8.5.0 and removed v8.5.1 labels Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :Distributed/Health Issues for the health report API Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v8.5.0 v8.6.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[HealthAPI] Disk indicator is not reporting affected indices

5 participants