(Doc+) Recovery ".kibana*" indices write block#189999
(Doc+) Recovery ".kibana*" indices write block#189999stefnestor wants to merge 3 commits intomainfrom
Conversation
👋🏽 howdy, team! Adds in how to recover from KB index write blocks which is commonly surfaced to Support by users as recovery step after facing either [flood-stage watermark](https://www.elastic.co/guide/en/elasticsearch/reference/current/fix-watermark-errors.html) or [max shards open](https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html#_this_action_would_add_x_total_shards_but_this_cluster_currently_has_yz_maximum_shards_open) Elasticsearch errors.
|
Pinging @elastic/kibana-docs (Team:Docs) |
|
A documentation preview will be available soon. Request a new doc build by commenting
If your PR continues to fail for an unknown reason, the doc build pipeline may be broken. Elastic employees can check the pipeline status here. |
|
👋 @rudolf , I've heard through grape-vine that you might have concerns with this recovery outline, that potentially it is a sunk cost belief that Dev would actually recommend avoiding across all versions. Will you kindly update if so? Context I know:
I believe there may be a separate meeting being scheduled between Support+Dev for long-term on how to better avoid/answer this ballpark, but bumping here so this PR can merge/close-unmerged rather than continue sitting WoDev. TIA! 🙏 |
|
@stefnestor Yeah, what happens is that flood stage watermarks, max shards open or other temporary ES problems like continuously hitting circuit breakers causes a migration to continuously fail. The solution is always to fix the underlying ES problem. Once this is done Kibana will automatically reattempt the migration without further intervention by a user. Users might choose to revert to the previous version if Kibana availability is important and they believe an ES fix might take prohibitively long. The problem comes in when on ECH a failed Kibana upgrade will automatically rollback without properly following the rollback instructions. This usually results in Kibana version N being unable to upgrade because of a partial Kibana version N+1 migration having been started (the cause of the write block). The recommended fix is to follow our documentation to correctly rollback to the previous version by restoring the Kibana feature state from before the upgrade. Perhaps what we can do to help users better is to document the the write block under https://www.elastic.co/guide/en/kibana/current/resolve-migrations-failures.html and explain that this most likely is a case of incorrectly rolling back. Recommending that users following our rollback instructions. WDYT? |
|
@rudolf sounds like a game plan to me. If I may confirm, you just linked to a guide that also recommends manually removing the write block instead of the Rollback Kibana guide, was that intentional? I may have misunderstood, sorry. EDIT: Noting for Sev1 01756480 with workaround KB outlining what we think we don't want it to say, snapshot restoring didn't work because the restored feature state from last successful snapshot was already write blocked. 😬 |
|
@stefnestor https://www.elastic.co/guide/en/kibana/current/resolve-migrations-failures.html documents how to fix migrations failing due to corrupt documents. Since these corrupt documents exist in the existing index they are also present in any feature state snapshots. So in this case a feature state snapshot would not be able to resolve the problem. |
|
Sorry for the delay, I was out on vacation. I think I'm caught on a Y of an XY problem.
My belief is that KB migrations will error because of ES; but after fixing ES, KB will continue erring that its indices are write blocked so the migration will not progress even after restarting KB. That's why some of my teammates have thought they needed to manually remove the write block (to push the state one step prior so KB starts the migration without erring even though the first thing it does it re-establish the write block). Does that line up to your expectations or sound unexpected to you? |
|
Hi @stefnestor, what's the status of this PR? We're going to be migrating to a new docs format and repo this week, making this PR invalid. |
👋🏽 howdy, team!
Summary
Adds in how to recover from KB index write blocks which is commonly surfaced to Support by users as recovery step after facing either flood-stage watermark or max shards open Elasticsearch errors. Adding in now that elastic/elasticsearch#111315 merged.
Recovery is either manually removing write block (which will auto-reapply if it thinks issue is ongoing) or resetting to earlier Kibana.
Checklist
Delete any items that are not applicable to this PR.
Risk Matrix
For maintainers