Forked from #15026:
We have another issue. After running a cluster for 10 days two nodes died during weekend. It's a struggle to get them up again. Lots of errors like below:
My question is: Should I make this another issue? Is this us doing things wrong? How do we debug this?
We really love Cockroach, but there is a lot of things that goes wrong for us at the moment.
I170502 10:01:52.180016 76 storage/replica_raftstorage.go:413 [replicate,n3,s3,r29/3:/System/tsd/cr.node.sys.cpu.s…,@c4204ffb00] generated preemptive snapshot 40410405 at index 53
I170502 10:01:52.183406 76 storage/replicate_queue.go:231 [replicate,n3,s3,r29/3:/System/tsd/cr.node.sys.cpu.s…,@c4204ffb00] snapshot failed: r29: remote declined snapshot: reservation rejected
AND
I170502 09:59:21.546561 31 server/status/runtime.go:227 [n1] runtime stats: 99 MiB RSS, 209 goroutines, 17 MiB/8.3 MiB/32 MiB GO alloc/idle/total, 13 MiB/19 MiB CGO alloc/total, 68.10cgo/sec, 0.01/0.00 %(u/s)time, 0.00 %gc (0x)
I170502 09:59:22.121205 252 vendor/google.golang.org/grpc/clientconn.go:806 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 10.135.15.39:26257: getsockopt: connection refused"; Reconnecting to {10.135.15.39:26257 <nil>}
E170502 09:59:22.704451 36 gossip/gossip.go:972 [n1] unable to get address for node 3: unable to look up descriptor for node 3
E170502 09:59:22.704851 36 gossip/gossip.go:972 [n1] unable to get address for node 3: unable to look up descriptor for node 3
I170502 09:59:22.955127 252 vendor/google.golang.org/grpc/clientconn.go:806 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 10.135.15.39:26257: getsockopt: connection refused"; Reconnecting to {10.135.15.39:26257 <nil>}
Forked from #15026:
We have another issue. After running a cluster for 10 days two nodes died during weekend. It's a struggle to get them up again. Lots of errors like below:
My question is: Should I make this another issue? Is this us doing things wrong? How do we debug this?
We really love Cockroach, but there is a lot of things that goes wrong for us at the moment.
AND