Skip to content

When performing loss of quorum replica recovery operations leave an audit trail in the logs. #73281

@aliher1911

Description

@aliher1911

Replica recovery operations are destructive and could cause data loss. If cluster is kept as is after recovery and data is not migrated to a healthy cluster afterwards it could exhibit unexpected behaviours stemming from corrupted data. Subsequent investigations could be hard as normal consensus and group membership logic would not apply.

Provide update info as a part of debug.zip so that it would be easily discoverable during investigations.

To make it preservable, we may need to store this info in store local key and then move it to server log whenever cluster is restarted.

As the first stage of this task preserve this data as StructuredEvents as they could be preserved separately from normal server logs and since those events are rare they should be treated similarly to cluster lifecycle type events.

Metadata

Metadata

Assignees

Labels

A-kv-replicationRelating to Raft, consensus, and coordination.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions