-
Notifications
You must be signed in to change notification settings - Fork 4.1k
storage: more quota pool deadlocks #17826
Copy link
Copy link
Closed
Milestone
Description
I'm able to fairly reliably reproduce a deadlock in the quota pool while performing a ten-node, 2TB restore. Looks very much like #17524, except that #17796 doesn't seem to fix the problem. On the node coordinating the restore, we see a hung goroutine:
goroutine 3312 [select, 22 minutes]:
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendToReplicas(0xc4204c0000, 0x7fb93f5f7438, 0xc4242309c0, 0xc4204c0048, 0x3f1, 0xc426483f80, 0x4, 0x4, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:1153 +0x1417
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendRPC(0xc4204c0000, 0x7fb93f5f7438, 0xc4242309c0, 0x3f1, 0xc426483f80, 0x4, 0x4, 0x0, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:391 +0x2db
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendSingleRange(0xc4204c0000, 0x7fb93f6504d0, 0xc45309e7b0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:455 +0x17b
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendPartialBatch(0xc4204c0000, 0x7fb93f6504d0, 0xc45309e7b0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:939 +0x447
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).divideAndSendBatchToRanges(0xc4204c0000, 0x7fb93f6504d0, 0xc45309e7b0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:804 +0xb46
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).Send(0xc4204c0000, 0x7fb93f6504d0, 0xc45309e7b0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:606 +0x344
github.com/cockroachdb/cockroach/pkg/kv.(*TxnCoordSender).Send(0xc420630000, 0x7fb93f6504d0, 0xc45309e780, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_coord_sender.go:435 +0x1f1
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).send(0xc420415b40, 0x7fb93f5f7438, 0xc421e2bbc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:554 +0x1ff
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).(github.com/cockroachdb/cockroach/pkg/internal/client.send)-fm(0x7fb93f5f7438, 0xc421e2bbc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:491 +0x83
github.com/cockroachdb/cockroach/pkg/internal/client.sendAndFill(0x7fb93f5f7438, 0xc421e2bbc0, 0xc422d217d0, 0xc428eb7800, 0xc424230300, 0xc428ed8320)
/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:463 +0x103
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).Run(0xc420415b40, 0x7fb93f5f7438, 0xc421e2bbc0, 0xc428eb7800, 0xc428ebc900, 0xc422d219e8)
/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:491 +0x9d
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).AdminSplit(0xc420415b40, 0x7fb93f5f7438, 0xc421e2bbc0, 0x1bbbca0, 0xc428ebc8e0, 0x1bbbca0, 0xc428ebc900, 0x2, 0x0)
/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:383 +0x98
github.com/cockroachdb/cockroach/pkg/ccl/sqlccl.restore.func2(0x0, 0x0)
/go/src/github.com/cockroachdb/cockroach/pkg/ccl/sqlccl/restore.go:735 +0x56b
github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1(0xc421e2bc00, 0xc421e31730)
/go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:58 +0x57
created by github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup.(*Group).Go
/go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:66 +0x66
On the node processing that split, there's a whole bunch of hung goroutines, plus a whole lot of goroutines in quotaPool.acquire: https://gist.github.com/benesch/c747301192eb984686fb60fccf24c506
I'm running with a relatively recent version of #17796, though not exactly the version that merged. https://github.com/benesch/cockroach/tree/quota-pool-deadlock
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels