Adding impacts block to the health info API response#84899
Adding impacts block to the health info API response#84899masseyke merged 30 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-data-management (Team:Data Management) |
|
Hi @masseyke, I've created a changelog YAML for you. |
server/src/main/java/org/elasticsearch/health/SimpleHealthIndicatorImpact.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/cluster/routing/allocation/ShardsAvailabilityHealthIndicatorService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/health/HealthIndicatorImpact.java
Outdated
Show resolved
Hide resolved
…e/elasticsearch into feature/health-api-impacts-block
|
Here's an example response. You can see empty Here's the impact if the primaries and replicas are unavailable: And here is the impact if primaries are unavailable but replicas are: In all of the above, it will show at most 10 indices. If there are more, it will include And if there are corrupted repositories: If ILM is expected to be running but is not: And if SLM is expected to be running but is not: |
|
@elasticmachine update branch |
andreidan
left a comment
There was a problem hiding this comment.
Thanks Keith.
I just had a couple of comments.
In the interest of keeping scope to a minimum can you please remove the impacts for all indicators except the shards_availability one? I think it'd be easier to have separate PRs for the other impacts and merge them independently The cluster coordination one for e.g. needs a lot of work to be able to safely indicate the master is unstable, the snapshot one might want to indicate which repository is corrupted etc
What do you think?
...a/org/elasticsearch/cluster/routing/allocation/ShardsAvailabilityHealthIndicatorService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/health/HealthIndicatorImpact.java
Outdated
Show resolved
Hide resolved
I'll change them to "TODO". With this change, we have to have some impact so I figured I'd at least put a placeholder in for all of them. The snapshot health indicator already reports the repository name in the summary so I was avoiding duplicating that in the impact. |
andreidan
left a comment
There was a problem hiding this comment.
Thanks for iterating on this Keith.
On the TODO aspect, I think this would make it harder for the UI team to work out if they should display the impact or not.
How about we don't include the impact section at all for the indicators for which we don't have the impacts yet? (so for this PR only the shards_availability indicator will report an impact section)
...a/org/elasticsearch/cluster/routing/allocation/ShardsAvailabilityHealthIndicatorService.java
Show resolved
Hide resolved
I assumed the UI team was just showing whatever string we gave them, and that it would be better to see a placeholder than nothing. I'll change it to an empty impact (there's no such thing as no impact any more). |
Thanks for iterating on this Keith. Whilst they're testing the first iterations we might not have impacts for all indicators. I think our APIs generally don't report empty blocks e.g. https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/action/admin/cluster/allocation/ClusterAllocationExplanation.java#L171 , https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/IndexLifecycleExplainResponse.java#L471 |
server/src/main/java/org/elasticsearch/health/SimpleHealthIndicatorImpact.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/cluster/routing/allocation/ShardsAvailabilityHealthIndicatorService.java
Outdated
Show resolved
Hide resolved
andreidan
left a comment
There was a problem hiding this comment.
Thanks for iterating on this Keith. I think this is nearly done.
I've left one question.
...a/org/elasticsearch/cluster/routing/allocation/ShardsAvailabilityHealthIndicatorService.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/cluster/routing/allocation/ShardsAvailabilityHealthIndicatorService.java
Outdated
Show resolved
Hide resolved
| if (indicesWithUnavailablePrimariesAndReplicas.isEmpty() == false) { | ||
| String impactDescription = String.format( | ||
| Locale.ROOT, | ||
| "%s [%s] %s unavailable. You are at risk of losing the data in %s.", |
There was a problem hiding this comment.
The current impacts are:
Primaries available, replicas unassigned:
"impact": {
"description": "Redundancy for 3 indices [my-index-000001, my-index-000003, my-index-000002] is currently disrupted. Fault tolerance and search scalability are reduced.",
"severity": 3
}
edit: update category to "Primaries unassigned"
Primaries unassigned:
"impact": {
"description": "Cannot add data to 1 index [my-index-000001]. Searches might return incomplete results.",
"severity": 2
}
Both primaries and replicas unavailable:
"impact": {
"description": "Indices [my-index-000001, my-index-000003, my-index-000002] are unavailable. You are at risk of losing the data in these indices.",
"severity": 1
}
@cristina-eleonora @shubhaat @Leaf-Lin Can you please provide some feedback on these messages?
There was a problem hiding this comment.
Thanks for the ping @andreidan !
Primaries available, replicas unassigned:
"impact": { "description": "Redundancy for 3 indices [my-index-000001, my-index-000003, my-index-000002] is currently disrupted. Fault tolerance and search scalability are reduced.", "severity": 3 }
Could we rephrase the first half to something like: Data redundancy is affected on 3 indices [my-index-000001, my-index-000003, my-index-000002]. ? Are there any impacts on search/ingest? And could we phrase Fault tolerance and search scalability are reduced. in a way that a non technical / minimally technical person would understand? Perhaps some of the writers could help with this.
Primaries unassigned, but replicas available:
"impact": { "description": "Cannot add data to 1 index [my-index-000001]. Searches might return incomplete results.", "severity": 2 }
Looks great 👍🏻
Both primaries and replicas unavailable:
"impact": { "description": "Indices [my-index-000001, my-index-000003, my-index-000002] are unavailable. You are at risk of losing the data in these indices.", "severity": 1 }
Does this mean that those indices cannot be accessed anymore? Or that the data on those indices cannot be accessed? How will that impact search/ingest?
There was a problem hiding this comment.
I thought Primaries unassigned, but replicas available would not happen because replica will be promoted?
There was a problem hiding this comment.
Based on @Leaf-Lin comment above we decided to simplify the impacts into the following categories:
- primaries unavailable for indices:
Cannot add data to 1 index [my-index-000001]. Searches will return incomplete results.
- replicas unavailable for indices:
Data redundancy is affected on 3 indices [my-index-000001, my-index-000003, my-index-000002]. Fault tolerance and search scalability are reduced.
Thanks for your thoughts on the messages @cristina-eleonora
I think with the simplified cases we have the Fault tolerance and search scalability are reduced. bit to rephrase somehow.
When talking about replicas in our docs we say:
Replicas provide redundant copies of your data to protect against hardware failure and increase capacity to serve read requests like searching or retrieving a document
Would impact number 2 make more sense if it was: (or something along those lines?)
Data redundancy is affected on 3 indices [my-index-000001, my-index-000003, my-index-000002]. The capacity to serve search requests is reduced and you could be susceptible to data loss in case of hardware failure.
There was a problem hiding this comment.
OK I have the wording in from Andre's last comment. I can easily change it if we need to.
There was a problem hiding this comment.
For missing replicas, how about just:
Searches might return slower than expected. Fewer redundant copies of the data exist than expected on 3 indices [my-index-000001, my-index-000003, my-index-000002].
Or maybe the slightly scarier:
Searches might return slower than expected, and you are at higher risk of the cluster becoming unstable in the future. Fewer redundant copies of the data exist than expected on 3 indices [my-index-000001, my-index-000003, my-index-000002].
Or the even scarier:
Searches might return slower than expected, and you are at higher risk of losing data. Fewer redundant copies of the data exist than expected on 3 indices [my-index-000001, my-index-000003, my-index-000002].
There was a problem hiding this comment.
I personally like the 1st and the 3rd 🙂 The third is more explicit on the importance of replicas, but may scare the users. If they should be scared, let's keep it, if not, perhaps use the 1st.
There was a problem hiding this comment.
Pinging @alaudazzi , our UX writer.
There was a problem hiding this comment.
I like 1 too, with the only mentions that "than expected ..." is repeated and in the 2nd sentence it should come earlier (?)
I suggest:
Searches might return slower than usual. Fewer than expected redundant copies of the data exist on 3 indices [my-index-000001, my-index-000003, my-index-000002].
Or maybe even drop "than expected" in the 2nd sentence altogether (it reads very strange to me)
Searches might return slower than usual. Fewer redundant copies of the data exist on 3 indices [my-index-000001, my-index-000003, my-index-000002].
There was a problem hiding this comment.
I went with Andre's last wording, but can update at any point later if we work out something better.
andreidan
left a comment
There was a problem hiding this comment.
LGTM, thanks for iterating on this Keith
Left 2 very minor comments, but I think we can merge this soon as the messaging seems to be getting some consensus.
server/src/main/java/org/elasticsearch/health/HealthIndicatorResult.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/cluster/routing/allocation/ShardsAvailabilityHealthIndicatorService.java
Show resolved
Hide resolved
|
@elasticmachine run CLA |
The ability to add impacts to indicators was added in #84899, but impacts for all indicators other than shards availability were left empty. This commit adds potential impacts for the other indicators.
This change introduces the notion of an impact to a health API indicator, and adds impacts to all currently-existing
indicators.
Closes #84773