Skip to content

storage: splitPostApply can see tombstone for RHS #40470

@knz

Description

@knz

Describe the problem

node fails with

F190904 14:33:35.276986 138 storage/store.go:2148  [n1,s1,r313/4:/Table/58/1/{2525/9…-3750/1…}] split trigger found right-hand side with tombstone {NextReplicaID:5}: [n1,s1,r316/?:{-}]

To Reproduce

  1. roachprod create knz-gce -u knz -c gce --geo -n 10
  2. roachprod start $CLUSTER:1,4,7-8
  3. roachprod run $CLUSTER:1 -- "./cockroach workload fixtures import tpcc --warehouses=5000 --db=tpcc --experimental-direct-ingestion"

This fails within 1-2 minutes.

Relevant log lines:

I190904 14:33:33.261425 180 server/status/runtime.go:498  [n1] runtime stats: 4.7 GiB RSS, 407 goroutines, 2.6 GiB/718 MiB/3.4 GiB GO alloc/idle/total, 1.1 GiB/1.3 GiB CGO alloc/total, 210805.4 CGO/sec, 381.5/16.5 %(u/s)time, 0.0 %gc (3x),
 223 MiB/85 MiB (r/w)net
I190904 14:33:33.964062 11702 storage/replica_raft.go:291  [n1,s1,r261/1:/Table/56{-/1}] proposing REMOVE_REPLICA[(n4,s4):3]: after=[(n1,s1):1 (n3,s3):2 (n2,s2):5] next=6
W190904 14:33:34.137913 11847 storage/replica_raft.go:105  [n1,s1,r323/1:/Table/60/1/2{633/2/…-766/4/…}] context canceled before proposing: 1 HeartbeatTxn
I190904 14:33:34.489669 12200 storage/replica_command.go:1521  [n1,replicate,s1,r247/1:/{Table/61/3-Max}] change replicas (add [] remove [(n3,s3):2]): existing descriptor r247:/{Table/61/3-Max} [(n1,s1):1, (n3,s3):2, (n2,s2):3, (n4,s4):5,
next=6, gen=20]
I190904 14:33:34.669612 11610 storage/replica_raftstorage.go:793  [n1,s1,r313/4:{-}] applying LEARNER snapshot [id=faf20096 index=15]
I190904 14:33:34.944105 11610 storage/replica_raftstorage.go:814  [n1,s1,r313/4:/Table/58/1/{2525/9…-3750/1…}] applied LEARNER snapshot [total=274ms ingestion=4@217ms id=faf20096 index=15]
I190904 14:33:35.053208 12088 storage/split_queue.go:149  [n1,split,s1,r307/1:/Table/54/1/125{3/119…-5/522…}] split saw concurrent descriptor modification; maybe retrying
W190904 14:33:35.202730 12090 storage/replica_raft.go:105  [n1,s1,r304/1:/Table/53/1/250{3/7/-…-5/3/-…}] context canceled before proposing: 1 HeartbeatTxn
I190904 14:33:35.232699 12190 storage/replica_command.go:395  [n1,split,s1,r307/1:/Table/54/1/125{3/119…-5/522…}] initiating a split of this range at key /Table/54/1/1254/11837 [r329] (77 MiB above threshold size 64 MiB)
F190904 14:33:35.276986 138 storage/store.go:2148  [n1,s1,r313/4:/Table/58/1/{2525/9…-3750/1…}] split trigger found right-hand side with tombstone {NextReplicaID:5}: [n1,s1,r316/?:{-}]
goroutine 138 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc000448301, 0xc000448360, 0x0, 0x7c662a)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1016 +0xb1
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x7c04a40, 0xc000000004, 0x73d2595, 0x10, 0x864, 0xc0006e43c0, 0x89)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:874 +0x93e
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x4e5e460, 0xc03ccc99e0, 0x4, 0x2, 0x4563081, 0x3a, 0xc003394730, 0x2, 0x2)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x2cc
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x4e5e460, 0xc03ccc99e0, 0x1, 0xc000000004, 0x4563081, 0x3a, 0xc003394730, 0x2, 0x2)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:69 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:180
github.com/cockroachdb/cockroach/pkg/storage.splitPostApply(0x4e5e460, 0xc03ccc99e0, 0x0, 0x15c142ce810fb293, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2148 +0xb3e
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleSplitResult(...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_result.go:233
github.com/cockroachdb/cockroach/pkg/storage.(*replicaStateMachine).handleNonTrivialReplicatedEvalResult(0xc003e298c0, 0x4e5e460, 0xc03ccc99e0, 0x0, 0x0, 0xc0003e1e40, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_state_machine.go:943 +0x852
github.com/cockroachdb/cockroach/pkg/storage.(*replicaStateMachine).ApplySideEffects(0xc003e298c0, 0x4eacee0, 0xc063842008, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_state_machine.go:856 +0x72d
github.com/cockroachdb/cockroach/pkg/storage/apply.mapCheckedCmdIter(0x7f378dc4b0c8, 0xc003e29ad8, 0xc0033953d8, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/cmd.go:182 +0x11b
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).applyOneBatch(0xc003395800, 0x4e5e460, 0xc03ccc99e0, 0x4eacfa0, 0xc003e29a78, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:276 +0x228
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).ApplyCommittedEntries(0xc003395800, 0x4e5e460, 0xc03ccc99e0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:242 +0xcf
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc003e29800, 0x4e5e460, 0xc03ccc99e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:759 +0xd87
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue.func1(0x4e5e460, 0xc03ccc99e0, 0xc003e29800, 0x4e5e460)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3599 +0x131
github.com/cockroachdb/cockroach/pkg/storage.(*Store).withReplicaForRequest(0xc000adc000, 0x4e5e460, 0xc03ccc99e0, 0xc009c02200, 0xc074b31e98, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3352 +0x150

Expected behavior

Import succeeds

Context

kena@knz-gce-0001:~$ ./cockroach  version
Build Tag:    v19.2.0-alpha.20190606-2012-g58d0fc3
Build Time:   2019/09/04 11:31:21
Distribution: CCL
Platform:     linux amd64 (x86_64-unknown-linux-gnu)
Go Version:   go1.12.5
C Compiler:   gcc 6.3.0
Build SHA-1:  58d0fc3676726c7fa3ebaf41e99f54f305f25fa0
Build Type:   release

Metadata

Metadata

Assignees

Labels

A-disaster-recoveryS-2-temp-unavailabilityTemp crashes or other availability problems. Can be worked around or resolved by restarting.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions