-
Notifications
You must be signed in to change notification settings - Fork 4.1k
When performing loss of quorum replica recovery operations leave an audit trail in the logs. #73281
Description
Replica recovery operations are destructive and could cause data loss. If cluster is kept as is after recovery and data is not migrated to a healthy cluster afterwards it could exhibit unexpected behaviours stemming from corrupted data. Subsequent investigations could be hard as normal consensus and group membership logic would not apply.
Provide update info as a part of debug.zip so that it would be easily discoverable during investigations.
To make it preservable, we may need to store this info in store local key and then move it to server log whenever cluster is restarted.
As the first stage of this task preserve this data as StructuredEvents as they could be preserved separately from normal server logs and since those events are rare they should be treated similarly to cluster lifecycle type events.