Disk Usage health indicator

Create a disk usage indicator that report to the user when their cluster is running out of space and the impact this has for its function. We propose the following health status and their interpretations:

Status | Meaning | Implementation
-- | -- | --
RED | the disk is running out of space on at least one node or writes are blocked because of limited disk space. | At least one node is above the flooding watermark, or at least one index is blocked by READ_ONLY_ALLOW_DELETE_BLOCK
YELLOW | There is increased disk usage on at least one node. | At least one data node is above the high watermark with no relocating shards or a non-data node is above the high watermark.*
GREEN | All good, nothing elasticsearch cannot handle :) . | If none of the above apply.

**Implementation details**
The collection of the data should be done using the persistent tasks frameworks.

Nodes will listen to cluster state changes for the allocation of the "health persistent task" and push their initial status. After the initialization, the nodes will only push changes to their state (ie. when they change from RED to YELLOW).

The allocated persistent task should be prepared to delay a potential initial request for health if the request arrives before it got a chance to receive the statuses from the nodes.

- [x] Introduce the persistent task (https://github.com/elastic/elasticsearch/pull/86131)
- [x] Propagate disk usage thresholds and watermarks to all nodes (https://github.com/elastic/elasticsearch/pull/88175)
- [x] Introduce thresholds for non-data nodes (parked for now, we want to see if reusing the flood stage and high watermarks is good enough.
- [x] Monitor a node's disk usage health (https://github.com/elastic/elasticsearch/pull/88390)
- [x] [Health node] Cache each node's disk usage health (https://github.com/elastic/elasticsearch/pull/89275)
- [x] The coordinating node retrieves the health info from the health node (https://github.com/elastic/elasticsearch/pull/89820, https://github.com/elastic/elasticsearch/pull/89947)
- [x] Use the retrieved  disk usage health info and the blocked indices from the cluster state (if they exist) to compute the indicator (https://github.com/elastic/elasticsearch/pull/90041)
- [x] Remove the feature flag (https://github.com/elastic/elasticsearch/pull/90085)
- [x] Write troubleshooting doc & document the new settings (https://github.com/elastic/elasticsearch/pull/90504)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk Usage health indicator #84811

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Status	Meaning	Implementation
RED	the disk is running out of space on at least one node or writes are blocked because of limited disk space.	At least one node is above the flooding watermark, or at least one index is blocked by READ_ONLY_ALLOW_DELETE_BLOCK
YELLOW	There is increased disk usage on at least one node.	At least one data node is above the high watermark with no relocating shards or a non-data node is above the high watermark.*
GREEN	All good, nothing elasticsearch cannot handle :) .	If none of the above apply.

Disk Usage health indicator #84811

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions