Add disk thresholds in the cluster state by gmarouli · Pull Request #88175 · elastic/elasticsearch

gmarouli · 2022-06-29T12:18:03Z

Problem statement
For a data node, we use the watermarks to determine if a node's disk usage is healthy. The watermarks can be configured in different ways and it's possible that each node has a different watermark configuration. This is not desirable, we want to use the same thresholds for all nodes and specifically the ones that the master is using.

Proposal
When a node is a elected as master, it will add a custom metadata to the cluster state that will describe these thresholds. For example:

    "health": {
      "disk": {
        "low_watermark": "85%",
        "high_watermark": "90%",
        "flood_stage_watermark": "95%",
        "frozen_flood_stage_watermark": "95%",
        "frozen_flood_stage_max_headroom": "20gb"
      }
    }

In this PR, we introduce the health metadata and we wire the existing disk thresholds to update the health metadata in the cluster state.

Part of #84811

gmarouli · 2022-06-29T13:12:02Z

@elasticmachine run elasticsearch-ci/part-2

gmarouli · 2022-06-29T13:24:56Z

@dakrone referring to #87975 (review)

I think there is a misunderstanding. The health metadata are not supposed to be node specific, we intend them to be the same for all the nodes of the cluster and determined by the master node.

Effectively, that's what is happening right when ti comes to allocation too. All nodes have disk threshold settings (potentially different ones), but the elected master node will use their own in the allocation decider. In a similar way, we want the master node to propagate the thresholds so every node can check their disk usage and report back using the same thresholds.

Is this more clear now?

server/src/main/java/org/elasticsearch/cluster/routing/allocation/DiskThresholdSettings.java

andreidan

Thanks for working on this Mary.

I've left a few suggestions and questions

server/src/main/java/org/elasticsearch/cluster/routing/allocation/DiskThresholdSettings.java

server/src/main/java/org/elasticsearch/health/metadata/HealthMetadataService.java

server/src/internalClusterTest/java/org/elasticsearch/health/HealthMetadataServiceIT.java

gmarouli · 2022-06-30T09:20:23Z

@elasticmachine update branch

server/src/main/java/org/elasticsearch/health/metadata/HealthMetadataService.java

gmarouli · 2022-07-04T07:29:50Z

@elasticmachine update branch

andreidan

LGTM, thanks for iterating on this Mary

server/src/main/java/org/elasticsearch/health/metadata/HealthMetadataService.java

gmarouli · 2022-07-07T07:14:49Z

@elasticmachine update branch

gmarouli · 2022-07-07T11:20:53Z

@elasticmachine update branch

gmarouli · 2022-07-07T13:27:56Z

Dotting the i's and crossing the t's

Removed the low watermark from the HealthMetadata, we do not use it in the disk health calculation so there is no need to store it there.
Reverted the DiskThresholdSettingParser in course of this PR it became unnecessary (at least for now) to decouple the settings and their parsing.

server/src/main/java/org/elasticsearch/common/unit/RelativeByteSizeValue.java

gmarouli · 2022-07-07T14:41:38Z

@elasticmachine update branch

Add disk thresholds in the cluster state

2c1d1b6

elasticsearchmachine added the v8.4.0 label Jun 29, 2022

gmarouli mentioned this pull request Jun 29, 2022

[Health API] Propagate the disk health thresholds via the cluster state #87975

Closed

gmarouli marked this pull request as ready for review June 29, 2022 13:15

gmarouli requested review from andreidan and dakrone June 29, 2022 13:15

gmarouli mentioned this pull request Jun 29, 2022

Disk Usage health indicator #84811

Closed

9 tasks

gmarouli added >non-issue :Distributed/Health Issues for the health report API labels Jun 29, 2022

elasticmachine added the Team:Data Management (obsolete) DO NOT USE. This team no longer exists. label Jun 29, 2022

gmarouli added 2 commits June 29, 2022 15:31

Reformat comment

ccf8a38

Rename listener

6b1d3b4

gmarouli commented Jun 29, 2022

View reviewed changes

server/src/main/java/org/elasticsearch/cluster/routing/allocation/DiskThresholdSettings.java Show resolved Hide resolved

andreidan reviewed Jun 29, 2022

View reviewed changes

gmarouli added 2 commits June 30, 2022 10:44

Listen to the disk threshold settings directly

47d1f32

Refactor UpdateHealthMetadataTask

e8b5a63

Merge branch 'master' into health-disk-metadata-in-cluster-state

dba5129

andreidan reviewed Jun 30, 2022

View reviewed changes

server/src/main/java/org/elasticsearch/health/metadata/HealthMetadataService.java Outdated Show resolved Hide resolved

gmarouli added 3 commits June 30, 2022 13:26

Refactor HealthMetadata

5e6c310

Refactor triggering a HealthMetadata update

c4b3b60

Listen to the settings changes directly

ec55f5d

Merge branch 'master' into health-disk-metadata-in-cluster-state

0eabd1d

gmarouli requested a review from andreidan July 4, 2022 08:53

Add javadoc to Threshold in HealthMetadata

1384730

andreidan approved these changes Jul 4, 2022

View reviewed changes

Test that reproduces the bug in cluster update

d78fbb3

gmarouli added 3 commits July 4, 2022 20:29

Apply all the updates sequentially

3b8801f

Update all watermarks per tier in the test

cdeeb8a

Remove redundant 'public' modifier

0c17e50

andreidan reviewed Jul 5, 2022

View reviewed changes

server/src/main/java/org/elasticsearch/health/metadata/HealthMetadataService.java Outdated Show resolved Hide resolved

elasticmachine and others added 5 commits July 7, 2022 16:44

Merge branch 'master' into health-disk-metadata-in-cluster-state

09bb060

UpsertHealthMetadataTask processes clusterState

bf1589b

Use RelativeByteSizeValue in HealthMetadata

ec47c20

Change the equals of the Disk thresholds

4efc305

Revert extraction of diskThresholdSettings parsing

dc2d5ab

elasticmachine and others added 4 commits July 7, 2022 20:50

Merge branch 'master' into health-disk-metadata-in-cluster-state

e22a524

Remove low watermark from HealthMetadata

2ea069a

Fix HealthMetadata XContent parser

837defd

Fix limits in random threshold generator

9ca9cbc

andreidan reviewed Jul 7, 2022

View reviewed changes

server/src/main/java/org/elasticsearch/common/unit/RelativeByteSizeValue.java Outdated Show resolved Hide resolved

Remove unnecessary equals & hash code

95111b9

Merge branch 'master' into health-disk-metadata-in-cluster-state

a733226

gmarouli merged commit 4834965 into elastic:master Jul 7, 2022

gmarouli deleted the health-disk-metadata-in-cluster-state branch July 7, 2022 16:00

kingherc mentioned this pull request Jul 13, 2022

Revisit default disk watermark on different data tiers #81406

Closed

gmarouli mentioned this pull request Jul 22, 2022

Change HealthMetadata to ClusterState.Custom #88736

Merged

Conversation

gmarouli commented Jun 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gmarouli commented Jun 29, 2022

Uh oh!

gmarouli commented Jun 29, 2022

Uh oh!

Uh oh!

andreidan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gmarouli commented Jun 30, 2022

Uh oh!

Uh oh!

gmarouli commented Jul 4, 2022

Uh oh!

andreidan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gmarouli commented Jul 7, 2022

Uh oh!

gmarouli commented Jul 7, 2022

Uh oh!

gmarouli commented Jul 7, 2022

Uh oh!

Uh oh!

gmarouli commented Jul 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gmarouli commented Jun 29, 2022 •

edited

Loading