(Doc+) Recovery ".kibana*" indices write block by stefnestor · Pull Request #189999 · elastic/kibana

stefnestor · 2024-08-06T17:43:26Z

👋🏽 howdy, team!

Summary

Adds in how to recover from KB index write blocks which is commonly surfaced to Support by users as recovery step after facing either flood-stage watermark or max shards open Elasticsearch errors. Adding in now that elastic/elasticsearch#111315 merged.

Recovery is either manually removing write block (which will auto-reapply if it thinks issue is ongoing) or resetting to earlier Kibana.

Checklist

Delete any items that are not applicable to this PR.

Risk Matrix

For maintainers

This was checked for breaking API changes and was labeled appropriately

👋🏽 howdy, team! Adds in how to recover from KB index write blocks which is commonly surfaced to Support by users as recovery step after facing either [flood-stage watermark](https://www.elastic.co/guide/en/elasticsearch/reference/current/fix-watermark-errors.html) or [max shards open](https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html#_this_action_would_add_x_total_shards_but_this_cluster_currently_has_yz_maximum_shards_open) Elasticsearch errors.

elasticmachine · 2024-08-06T17:43:29Z

Pinging @elastic/kibana-docs (Team:Docs)

github-actions · 2024-08-06T17:43:39Z

A documentation preview will be available soon.

🔨 Buildkite builds
📚 HTML diff
📙 Preview page

Request a new doc build by commenting

Rebuild this PR: run docs-build
Rebuild this PR and all Elastic docs: run docs-build rebuild

_{run docs-build is much faster than run docs-build rebuild. A rebuild should only be needed in rare situations.}

_{If your PR continues to fail for an unknown reason, the doc build pipeline may be broken. Elastic employees can check the pipeline status here.}

stefnestor · 2024-09-11T16:34:28Z

👋 @rudolf ,

I've heard through grape-vine that you might have concerns with this recovery outline, that potentially it is a sunk cost belief that Dev would actually recommend avoiding across all versions. Will you kindly update if so?

Context I know:

kibana#158733 lists that these steps can cause silent data loss specifically for v8.8.0 but not other versions
internal link lists this workaround or a close variation is viable for v7.17.9

I believe there may be a separate meeting being scheduled between Support+Dev for long-term on how to better avoid/answer this ballpark, but bumping here so this PR can merge/close-unmerged rather than continue sitting WoDev. TIA! 🙏

rudolf · 2024-09-17T09:56:48Z

@stefnestor Yeah, what happens is that flood stage watermarks, max shards open or other temporary ES problems like continuously hitting circuit breakers causes a migration to continuously fail.

The solution is always to fix the underlying ES problem. Once this is done Kibana will automatically reattempt the migration without further intervention by a user. Users might choose to revert to the previous version if Kibana availability is important and they believe an ES fix might take prohibitively long.

The problem comes in when on ECH a failed Kibana upgrade will automatically rollback without properly following the rollback instructions. This usually results in Kibana version N being unable to upgrade because of a partial Kibana version N+1 migration having been started (the cause of the write block). The recommended fix is to follow our documentation to correctly rollback to the previous version by restoring the Kibana feature state from before the upgrade.

Perhaps what we can do to help users better is to document the the write block under https://www.elastic.co/guide/en/kibana/current/resolve-migrations-failures.html and explain that this most likely is a case of incorrectly rolling back. Recommending that users following our rollback instructions. WDYT?

stefnestor · 2024-09-27T15:11:34Z

@rudolf sounds like a game plan to me. If I may confirm, you just linked to a guide that also recommends manually removing the write block instead of the Rollback Kibana guide, was that intentional? I may have misunderstood, sorry.

EDIT: Noting for Sev1 01756480 with workaround KB outlining what we think we don't want it to say, snapshot restoring didn't work because the restored feature state from last successful snapshot was already write blocked. 😬

rudolf · 2024-09-30T13:49:43Z

@stefnestor https://www.elastic.co/guide/en/kibana/current/resolve-migrations-failures.html documents how to fix migrations failing due to corrupt documents. Since these corrupt documents exist in the existing index they are also present in any feature state snapshots. So in this case a feature state snapshot would not be able to resolve the problem.

stefnestor · 2024-10-09T18:58:34Z

Sorry for the delay, I was out on vacation. I think I'm caught on a Y of an XY problem.

The solution is always to fix the underlying ES problem. Once this is done Kibana will automatically reattempt the migration without further intervention by a user.

My belief is that KB migrations will error because of ES; but after fixing ES, KB will continue erring that its indices are write blocked so the migration will not progress even after restarting KB. That's why some of my teammates have thought they needed to manually remove the write block (to push the state one step prior so KB starts the migration without erring even though the first thing it does it re-establish the write block). Does that line up to your expectations or sound unexpected to you?

florent-leborgne · 2025-01-29T09:31:52Z

Hi @stefnestor, what's the status of this PR? We're going to be migrating to a new docs format and repo this week, making this PR invalid.
If the information contained here is still relevant to add, let me know and I will recreate the PR in the new repo & format.
Thanks!

stefnestor added Team:Docs enhancement New value added to drive a business result docs labels Aug 6, 2024

stefnestor mentioned this pull request Aug 6, 2024

Add link to flood-stage watermark exception message elastic/elasticsearch#111315

Merged

feedback

f70c554

stefnestor added the release_note:enhancement label Aug 6, 2024

Merge branch 'main' into stefnestor-patch-3

4feb4c2

stefnestor closed this Jan 30, 2025

jbudz deleted the stefnestor-patch-3 branch February 19, 2025 22:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Doc+) Recovery ".kibana*" indices write block#189999

(Doc+) Recovery ".kibana*" indices write block#189999
stefnestor wants to merge 3 commits intomainfrom
stefnestor-patch-3

stefnestor commented Aug 6, 2024 •

edited

Loading

Uh oh!

elasticmachine commented Aug 6, 2024

Uh oh!

github-actions bot commented Aug 6, 2024

Uh oh!

stefnestor commented Sep 11, 2024

Uh oh!

rudolf commented Sep 17, 2024 •

edited

Loading

Uh oh!

stefnestor commented Sep 27, 2024 •

edited

Loading

Uh oh!

rudolf commented Sep 30, 2024

Uh oh!

stefnestor commented Oct 9, 2024

Uh oh!

florent-leborgne commented Jan 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

stefnestor commented Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Risk Matrix

For maintainers

Uh oh!

elasticmachine commented Aug 6, 2024

Uh oh!

github-actions bot commented Aug 6, 2024

Uh oh!

stefnestor commented Sep 11, 2024

Uh oh!

rudolf commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stefnestor commented Sep 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rudolf commented Sep 30, 2024

Uh oh!

stefnestor commented Oct 9, 2024

Uh oh!

florent-leborgne commented Jan 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stefnestor commented Aug 6, 2024 •

edited

Loading

rudolf commented Sep 17, 2024 •

edited

Loading

stefnestor commented Sep 27, 2024 •

edited

Loading