Skip to content

kvserver: default raft scheduler concurrency can cause cascading failures on beefy machines #56851

@tbg

Description

@tbg

The default number of worker goroutines in the Raft scheduler is 8*runtime.NumCPUs(). We have observed that, at least on v20.1, this can cause pathological behavior that is most likely to occur when the CPU and range count are both "high" (32 CPUs and 55k ranges did it in one recent example).

The pathological behavior entails a full breakdown of the system. The UI and all ranges stop working. It becomes nearly impossible to extract debugging information from the system.

From a goroutine dump (via kill -ABRT), we see many of the worker goroutines with the following stack:

sync.(*Mutex).Lock(...)
    /usr/local/go/src/sync/mutex.go:81
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).enqueueN(0xc0011ea900, 0x8, 0xc041aea000, 0x844d, 0x9800, 0xc038b821c0)

(enqueue1 similarly shows up). These are contending on a mutex, which is thought to be the root cause of the pathological behavior. This all looks like golang/go#33747, which was fixed in go1.14. CRDB v20.1 and v20.2 are both built with go1.13, which makes them susceptible to this bug. v21.1 will be built with go1.15, which has the fix.

Following the contention in the scheduler, we see outgoing raft message streams that are backed up because the recipient’s raft scheduler is unable to keep up. These have been seen stuck for dozens of minutes ([select, 17 minutes] etc):

goroutine 2707 [select]:
github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/internal/transport.(*writeQuota).get(0xc014642600, 0xc000000052, 0x4d, 0x5)
        /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/internal/transport/flowcontrol.go:59 +0xaa
github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/internal/transport.(*http2Client).Write(0xc0120461c0, 0xc0042dab00, 0xc02dbaa480, 0x5, 0x60, 0xc01dd44a80, 0x4d, 0xbb, 0xc03e70ecf5, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/internal/transport/http2_client.go:840 +0x1ae
github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*csAttempt).sendMsg(0xc013e8cd80, 0x416e3e0, 0xc014642700, 0xc03e70ecf0, 0x5, 0x5, 0xc01dd44a80, 0x4d, 0xbb, 0xc00bf3ccc0, ...)
        /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/stream.go:828 +0x128
github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*clientStream).SendMsg.func2(0xc013e8cd80, 0x4d, 0xbb)
        /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/stream.go:693 +0xb3
github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*clientStream).withRetry(0xc000139e60, 0xc018462c30, 0xc019b036f0, 0xc000263840, 0x0)
        /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/stream.go:573 +0x360
github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*clientStream).SendMsg(0xc000139e60, 0x416e3e0, 0xc014642700, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/stream.go:699 +0x399

To resolve the gridlock, we added the environment variable COCKROACH_SCHEDULER_CONCURRENCY=64 to all nodes in the cluster and restarted.
We verifed the problem was solved by letting the cluster come together, watch the metrics for Raft leaders to be elected on all ranges, gradually add load back to the cluster and keep monitoring.

We need to set better defaults for the Raft scheduler worker pool. Additionally, we should understand whether the extent of the degradation was expected given the misconfiguration or whether there are more improvements to resilience we need to make. This will likely entail reproducing the problem locally.

gz#8824

Metadata

Metadata

Assignees

Labels

A-kv-replicationRelating to Raft, consensus, and coordination.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions