Skip to content

stability: investigate recovery from permanent unavailability #17186

@hxiaodon

Description

@hxiaodon

Hi,
I proposed this issue in gitter at last weekend. It should be helpful if the disaster recovery tool is provided:)

Hi,All. I started a local cockroachdb cluster with 3 instances(node1,node2,node3), node2 and node3 join node1, all works fine even one node is crashed. When I crash node1 and node2(clean data folder and restart), node3 could not provide service(It's expected), but I could not dump data from node3.Suppose we have this scenario, if 2 nodes are just permanently dead and the daily backup operation was triggered long time ago while the rest healthy node keep the latest data(until service unavailable) compared with the backup one, at this time, I could not use the healthy node's data(dump/backup) with adding other 2 new nodes to provide the service. Does it make sense? Thanks!
What I mean is that cluster do not provide service when minority nodes left make sense, but it does not make sense to block dumping data(raft committed one) when minority nodes keep the complete copy(replication number equal to cluster nodes number)

Andrei Matei's reply
@hxiaodon indeed, we want to provide some type of data recovery tools for this situation, but we haven’t gotten to it yet

Jira issue: CRDB-6362

Metadata

Metadata

Assignees

Labels

A-kv-recoveryC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)O-communityOriginated from the community

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions