Skip to content

Fatal consistency check failed with 1 inconsistent replicas #35424

@awoods187

Description

@awoods187

Describe the problem
Hit a fatal that killed a node while importing TPC-C 10k on a 7 node 72 cpu machine.

F190305 14:52:51.825973 686646 storage/replica_consistency.go:165  [n3,consistencyChecker,s3,r336/3:/Table/56/1/"\x04{\x02…-W\"e…}] consistency check failed with 1 inconsistent replicas
goroutine 686646 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc000521100, 0xc0005211a0, 0x5374600, 0x1e)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1018 +0xd4
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x5b160a0, 0xc000000004, 0x5374685, 0x1e, 0xa5, 0xc0184c80a0, 0x7a)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:874 +0x95a
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x3a20040, 0xc009930d20, 0x4, 0x2, 0x32ecb3f, 0x36, 0xc0087407d0, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:85 +0x2d5
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x3a20040, 0xc009930d20, 0x1, 0xc000000004, 0x32ecb3f, 0x36, 0xc0087407d0, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:71 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(0x3a20040, 0xc009930d20, 0x32ecb3f, 0x36, 0xc0087407d0, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:182 +0x7e
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).CheckConsistency(0xc04529f680, 0x3a20040, 0xc009930d20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_consistency.go:165 +0x863
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).CheckConsistency(0xc04529f680, 0x3a20040, 0xc009930d20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_consistency.go:174 +0x9ce
github.com/cockroachdb/cockroach/pkg/storage.(*consistencyQueue).process(0xc001124480, 0x3a20040, 0xc009930d20, 0xc04529f680, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/consistency_queue.go:117 +0x1d8
github.com/cockroachdb/cockroach/pkg/storage.(*baseQueue).processReplica.func1(0x3a20040, 0xc009930d20, 0xdf8475800, 0x3a20040)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/queue.go:753 +0x24e
github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout(0x3a20040, 0xc009930d20, 0xc0348426f0, 0x2c, 0xdf8475800, 0xc00117ee90, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:90 +0xb5
github.com/cockroachdb/cockroach/pkg/storage.(*baseQueue).processReplica(0xc000756d00, 0x3a20080, 0xc024f38540, 0xc04529f680, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/queue.go:719 +0x215
github.com/cockroachdb/cockroach/pkg/storage.(*baseQueue).processLoop.func1.2(0x3a20080, 0xc01d9bdef0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/queue.go:646 +0xb8
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc00055a7e0, 0x3a20080, 0xc01d9bdef0, 0xc00b4524c0, 0x36, 0x0, 0x0, 0xc01b5d09e0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:325 +0xe6
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:320 +0x134

To Reproduce
roachprod create $CLUSTER -n 8 --clouds=aws --aws-machine-type-ssd=c5d.18xlarge
roachprod run $CLUSTER -- "DEV=$(mount | grep /mnt/data1 | awk '{print $1}'); sudo umount /mnt/data1; sudo mount -o discard,defaults,nobarrier ${DEV} /mnt/data1/; mount | grep /mnt/data1"
roachprod stage $CLUSTER:1-7 cockroach
roachprod stage $CLUSTER:8 workload
roachprod start $CLUSTER:1-7 -e COCKROACH_ENGINE_MAX_SYNC_DURATION=24h
roachprod adminurl --open $CLUSTER:1
roachprod run $CLUSTER:1 -- "./cockroach workload fixtures import tpcc --warehouses=10000 --db=tpcc"

Environment:
v19.1.0-beta.20190304-135-gd8f7e85
cockroach.log

Metadata

Metadata

Labels

C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.S-1-stabilitySevere stability issues that can be fixed by upgrading, but usually don’t resolve by restarting

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions