Skip to content

stability: cluster unable to recover after 2 node outage #15591

@petermattis

Description

@petermattis

Forked from #15026:

We have another issue. After running a cluster for 10 days two nodes died during weekend. It's a struggle to get them up again. Lots of errors like below:

My question is: Should I make this another issue? Is this us doing things wrong? How do we debug this?

We really love Cockroach, but there is a lot of things that goes wrong for us at the moment.

I170502 10:01:52.180016 76 storage/replica_raftstorage.go:413  [replicate,n3,s3,r29/3:/System/tsd/cr.node.sys.cpu.s…,@c4204ffb00] generated preemptive snapshot 40410405 at index 53
I170502 10:01:52.183406 76 storage/replicate_queue.go:231  [replicate,n3,s3,r29/3:/System/tsd/cr.node.sys.cpu.s…,@c4204ffb00] snapshot failed: r29: remote declined snapshot: reservation rejected

AND

I170502 09:59:21.546561 31 server/status/runtime.go:227  [n1] runtime stats: 99 MiB RSS, 209 goroutines, 17 MiB/8.3 MiB/32 MiB GO alloc/idle/total, 13 MiB/19 MiB CGO alloc/total, 68.10cgo/sec, 0.01/0.00 %(u/s)time, 0.00 %gc (0x)
I170502 09:59:22.121205 252 vendor/google.golang.org/grpc/clientconn.go:806  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 10.135.15.39:26257: getsockopt: connection refused"; Reconnecting to {10.135.15.39:26257 <nil>}
E170502 09:59:22.704451 36 gossip/gossip.go:972  [n1] unable to get address for node 3: unable to look up descriptor for node 3
E170502 09:59:22.704851 36 gossip/gossip.go:972  [n1] unable to get address for node 3: unable to look up descriptor for node 3
I170502 09:59:22.955127 252 vendor/google.golang.org/grpc/clientconn.go:806  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 10.135.15.39:26257: getsockopt: connection refused"; Reconnecting to {10.135.15.39:26257 <nil>}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions