-
Notifications
You must be signed in to change notification settings - Fork 4.1k
stability: investigate recovery from permanent unavailability #17186
Description
Hi,
I proposed this issue in gitter at last weekend. It should be helpful if the disaster recovery tool is provided:)
Hi,All. I started a local cockroachdb cluster with 3 instances(node1,node2,node3), node2 and node3 join node1, all works fine even one node is crashed. When I crash node1 and node2(clean data folder and restart), node3 could not provide service(It's expected), but I could not dump data from node3.Suppose we have this scenario, if 2 nodes are just permanently dead and the daily backup operation was triggered long time ago while the rest healthy node keep the latest data(until service unavailable) compared with the backup one, at this time, I could not use the healthy node's data(dump/backup) with adding other 2 new nodes to provide the service. Does it make sense? Thanks!
What I mean is that cluster do not provide service when minority nodes left make sense, but it does not make sense to block dumping data(raft committed one) when minority nodes keep the complete copy(replication number equal to cluster nodes number)
Andrei Matei's reply
@hxiaodon indeed, we want to provide some type of data recovery tools for this situation, but we haven’t gotten to it yet
Jira issue: CRDB-6362