Skip to content

storage: consistency check failure during import #36861

@nvb

Description

@nvb

To do / understand


This looks very similar to #35424, so it's possible that that issue wasn't fully resolved. I was most of the way through a TPC-C 4k import when a node died due to a consistency check failure.

F190416 01:58:51.634989 172922 storage/replica_consistency.go:220  [n5,consistencyChecker,s5,r590/1:/Table/68/1/{29/4/2…-31/4/1…}] consistency check failed with 1 inconsistent replicas
goroutine 172922 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc000056301, 0xc000056300, 0x5449800, 0x1e)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1020 +0xd4
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x5bdd700, 0xc000000004, 0x5449860, 0x1e, 0xdc, 0xc008bfa5a0, 0x79)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:878 +0x93d
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x3aa1620, 0xc0071899e0, 0x4, 0x2, 0x33b2862, 0x36, 0xc01f06cce0, 0x1, 0x1)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:85 +0x2d8
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x3aa1620, 0xc0071899e0, 0x1, 0xc000000004, 0x33b2862, 0x36, 0xc01f06cce0, 0x1, 0x1)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:71 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(0x3aa1620, 0xc0071899e0, 0x33b2862, 0x36, 0xc01f06cce0, 0x1, 0x1)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:182 +0x7e
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).CheckConsistency(0xc023eeb400, 0x3aa1620, 0xc0071899e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_consistency.go:220 +0x6ce
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).CheckConsistency(0xc023eeb400, 0x3aa1620, 0xc0071899e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_consistency.go:229 +0x81b
github.com/cockroachdb/cockroach/pkg/storage.(*consistencyQueue).process(0xc0003de2a0, 0x3aa1620, 0xc0071899e0, 0xc023eeb400, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/consistency_queue.go:125 +0x210

Cockroach SHA: 3ebed10

Notes:

Cluster: nathan-tpcc-geo (stopped, extended for 48h)
Cockroach nodes: 1,2,4,5,7,8,10,11
Inconsistent range: r590
Replicas: nathan-tpcc-geo:2/n2/r3, nathan-tpcc-geo:5/n4/r4, and nathan-tpcc-geo:7/n5/r1
Inconsistent replica: nathan-tpcc-geo:7/n5/r1
Replicas in zones: europe-west2-b, europe-west4-b, and asia-northeast1-b respectively

Initial Investigation

Unlike in the later reproductions of #35424, replica 1's Raft log is an exact prefix of replica 3 and 4's, so this doesn't look like the same issue we saw later in that issue.

I haven't looked at much else yet.

r590 Range _ Debug _ Cockroach Console.pdf

Metadata

Metadata

Labels

C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.S-1High impact: many users impacted, serious risk of high unavailability or data loss

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions