Skip to content

stability: rho failed on inconsistency #12130

@mberhault

Description

@mberhault

sha: 677f6f1 with race-detection enabled.

rho nodes all died within 24h
timeline:

0: cockroach@104.196.18.63
cockroach                        EXITED     Dec 06 01:45 AM
1: cockroach@104.196.102.49
cockroach                        EXITED     Dec 05 08:24 PM
2: cockroach@104.196.147.189
cockroach                        EXITED     Dec 05 07:58 PM
3: cockroach@104.196.164.29
cockroach                        EXITED     Dec 05 05:11 PM

three of the nodes died with the previously-seen "panic during panic", unique to race builds.
However, another (104.196.147.189) died with inconsistencies:

I161205 19:58:51.854693 58 storage/replica_proposal.go:404  [n3,s3,r2059/2:/Table/51/1/4{399621â<80>¦-404032â<80>¦},@c42cf38480] range [n3,s3,r2059/2:/Table/51/1/4{399621â<80>¦-404032â<80>¦}]: transferring raft leadership to replica ID 4
E161205 19:58:52.125690 65010180 storage/replica_command.go:1877  [n3,s3,r1618/4:/Table/51/1/352{09637â<80>¦-54308â<80>¦},@c42c1ffb00] replica {2 2 2} is inconsistent: expected checksum c1b4ffa6c5b3078da8b8a2632ab40cdf8e7362f2eac201dbb122100e569ad644374b964ab1ce71e9dad90614a863366c24d7d3762591d2b8e4bfec1e45ad05fb, got c86ca26b9f66abe6c4b92c21f0fb37b4cd55d96832577745715f648f8d1fd17d67d7f96bbd7f96206e6c3055308d59e7c33767261f1fcc518ee66f35e2143a33
--- leaseholder
+++ follower
-0.000000000,0 /Local/RangeID/1618/r/RaftTombstone
-  ts:<zero>
-  value:^R^D^H^@^P^@^X^@ ^@(^@2^Gúü 7^C^H^E
-  raw_key:/Local/RangeID/1618/r/RaftTombstone raw_value:1204080010001800200028003207fafca037030805
E161205 19:58:52.126742 65010179 storage/replica_command.go:1877  [n3,s3,r1618/4:/Table/51/1/352{09637â<80>¦-54308â<80>¦},@c42c1ffb00] replica {1 1 1} is inconsistent: expected checksum c1b4ffa6c5b3078da8b8a2632ab40cdf8e7362f2eac201dbb122100e569ad644374b964ab1ce71e9dad90614a863366c24d7d3762591d2b8e4bfec1e45ad05fb, got c86ca26b9f66abe6c4b92c21f0fb37b4cd55d96832577745715f648f8d1fd17d67d7f96bbd7f96206e6c3055308d59e7c33767261f1fcc518ee66f35e2143a33
--- leaseholder
+++ follower
-0.000000000,0 /Local/RangeID/1618/r/RaftTombstone
-  ts:<zero>
-  value:^R^D^H^@^P^@^X^@ ^@(^@2^Gúü 7^C^H^E
-  raw_key:/Local/RangeID/1618/r/RaftTombstone raw_value:1204080010001800200028003207fafca037030805
F161205 19:58:52.127938 65009868 storage/replica_command.go:1893  [n3,s3,r1618/4:/Table/51/1/352{09637â<80>¦-54308â<80>¦},@c42c1ffb00] consistency check failed with 2 inconsistent replicas
goroutine 65009868 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0x3d04a01, 0x7a030ec, 0x332fc60, 0x10b39ba)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:849 +0xc2
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x3331260, 0xc400000004, 0x2b252d2, 0x1a, 0x765, 0xc42ce3c700, 0x74)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:714 +0x99a
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x2e74900, 0xc4287704b0, 0x4, 0x2, 0x2056512, 0x36, 0xc4203f4000, 0x1, 0x1)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:140 +0x331
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x2e74900, 0xc4287704b0, 0x1, 0x4, 0x2056512, 0x36, 0xc4203f4000, 0x1, 0x1)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:88 +0x9a
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(0x2e74900, 0xc4287704b0, 0x2056512, 0x36, 0xc4203f4000, 0x1, 0x1)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:172 +0x90
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).CheckConsistency(0xc42c1ffb00, 0x2e74900, 0xc4287704b0, 0xc426cf98c0, 0x36, 0x40, 0xc426cf9b40, 0x36, 0x40, 0x1, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_command.go:1893 +0xc8c
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).addAdminCmd(0xc42c1ffb00, 0x2e74900, 0xc4287704b0, 0x148d747b3c31d98f, 0x0, 0x300000003, 0x4, 0x652, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:1635 +0x4a5
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).Send(0xc42c1ffb00, 0x2e74900, 0xc4287704b0, 0x148d747b3c31d98f, 0x0, 0x300000003, 0x4, 0x652, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:1221 +0x7a5
github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send(0xc420191880, 0x2e74900, 0xc4287704b0, 0x148d747b3c31d98f, 0x0, 0x300000003, 0x4, 0x652, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2446 +0x913
github.com/cockroachdb/cockroach/pkg/storage.(*Stores).Send(0xc4202a0cc0, 0x2e74900, 0xc4287704b0, 0x0, 0x0, 0x300000003, 0x4, 0x652, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/stores.go:187 +0x24b
github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal.func1(0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:818 +0x433
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr(0xc4203ac000, 0xc42bb67cc0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:245 +0x10e
github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal(0xc420210500, 0x2e74840, 0xc42cb94340, 0xc423bbcac8, 0xc42cb94340, 0xc42a7b4880, 0xc420d9cf00)
        /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:829 +0x210
github.com/cockroachdb/cockroach/pkg/server.(*Node).Batch(0xc420210500, 0x2e74840, 0xc42cb94340, 0xc423bbcac8, 0x2e7dba0, 0xc42a67da80, 0x11)
        /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:851 +0xbf
github.com/cockroachdb/cockroach/pkg/kv.(*grpcTransport).SendNext.func1(0x2e5cbc0, 0xc420210500, 0xc42b31ebe0, 0xc423bbcaa0, 0xc4298dc4e0)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/transport.go:185 +0x86
created by github.com/cockroachdb/cockroach/pkg/kv.(*grpcTransport).SendNext
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/transport.go:188 +0x3c3

Not sure how much effort we want to put into debugging rho, race-detection causes some odd things.
Silencing alerts for a bit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions