Skip to content

stability: crash at startup: panic: [group 903] entries[1:12) is unavailable from storage #7389

@mberhault

Description

@mberhault

Trying to upgrade rho to the beta candidate sha: f447eb0

On restart, all nodes crashed with:

I160622 14:16:15.777437 cli/start.go:317  CockroachDB beta-20160609-386-gf447eb0 (linux amd64, built 2016/06/22 13:18:56, go1.6.2)
I160622 14:16:15.777587 cli/start.go:148  writing go and jemalloc memory profiles to /home/ubuntu/logs every 10s
I160622 14:16:15.777782 server/context.go:265  1 storage engine initialized
I160622 14:16:15.777909 cli/start.go:334  starting cockroach node
W160622 14:16:15.791920 gossip/gossip.go:897  not connected to cluster; use --join to specify a connected node
I160622 14:16:15.805285 gossip/gossip.go:923  starting client to 104.196.42.152:26257
I160622 14:16:15.806082 storage/engine/rocksdb.go:129  opening rocksdb instance at "/mnt/data"
I160622 14:16:16.805855 gossip/gossip.go:923  starting client to 104.196.105.82:26257
I160622 14:16:16.806195 /go/src/google.golang.org/grpc/clientconn.go:499  grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 104.196.42.152:26257: getsockopt: connection refused"; Reconnecting to "104.196.42.152:26257"
I160622 14:16:18.221311 server/node.go:414  initialized store store=2:2 ([]=/mnt/data): {Capacity:396201193472 Available:340016451584 RangeCount:0}
I160622 14:16:18.221748 server/node.go:302  node ID 2 initialized
I160622 14:16:18.222765 storage/stores.go:292  read 3 node addresses from persistent storage
I160622 14:16:18.223194 server/node.go:535  connecting to gossip network to verify cluster ID...
I160622 14:16:18.228667 gossip/gossip.go:923  starting client to cockroach-rho-3:26257
E160622 14:16:18.466050 raft/log.go:308  [group 903] entries[1:12) is unavailable from storage
panic: [group 903] entries[1:12) is unavailable from storage

goroutine 44 [running]:
panic(0x1a75440, 0xc8203826d0)
        /usr/local/go/src/runtime/panic.go:481 +0x3ff
github.com/cockroachdb/cockroach/storage.(*raftLogger).Panicf(0xc82039def8, 0x2039a40, 0x2a, 0xc82041a340, 0x2, 0x2)
        /go/src/github.com/cockroachdb/cockroach/storage/raft.go:117 +0x1cb
github.com/coreos/etcd/raft.(*raftLog).slice(0xc8201ced90, 0x1, 0xc, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/coreos/etcd/raft/log.go:308 +0x581
github.com/coreos/etcd/raft.(*raftLog).nextEnts(0xc8201ced90, 0x0, 0x0, 0x0)
        /go/src/github.com/coreos/etcd/raft/log.go:142 +0x107
github.com/coreos/etcd/raft.newReady(0xc820c76b60, 0xc8203825b0, 0x6, 0x1, 0xb, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/coreos/etcd/raft/node.go:475 +0xe1
github.com/coreos/etcd/raft.(*RawNode).newReady(0xc820c6af00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/coreos/etcd/raft/rawnode.go:40 +0xbb
github.com/coreos/etcd/raft.(*RawNode).Ready(0xc820c6af00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/coreos/etcd/raft/rawnode.go:182 +0x58
github.com/cockroachdb/cockroach/storage.(*Replica).handleRaftReady.func1(0xc820c6af00, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:1352 +0xc8
github.com/cockroachdb/cockroach/storage.(*Replica).withRaftGroupLocked(0xc820425180, 0xc82086d4d0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:280 +0x60d
github.com/cockroachdb/cockroach/storage.(*Replica).handleRaftReady(0xc820425180, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:1355 +0x19f
github.com/cockroachdb/cockroach/storage.(*Store).processRaft.func1()
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:2106 +0x548
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker.func1(0xc820398f50, 0xc820824c90)
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:139 +0x60
created by github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:140 +0x70

Here are the per-node panic messages:

104.196.42.152
E160622 14:16:14.654867 raft/log.go:308  [group 903] entries[1:632) is unavailable from storage
panic: [group 903] entries[1:632) is unavailable from storage

104.196.105.82
E160622 14:16:18.466050 raft/log.go:308  [group 903] entries[1:12) is unavailable from storage
panic: [group 903] entries[1:12) is unavailable from storage

104.196.108.90
E160622 14:16:21.794437 raft/log.go:308  [group 901] entries[1:57) is unavailable from storage
panic: [group 901] entries[1:57) is unavailable from storage

104.196.19.81
E160622 14:16:43.618335 raft/log.go:308  [group 901] entries[1:57) is unavailable from storage
panic: [group 901] entries[1:57) is unavailable from storage

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions