Skip to content

lots of table creates lock up cluster. Contention on the SystemConfigSpan? #23254

@andreimatei

Description

@andreimatei

I'm playing around trying to repro the badness in #22933 and I can get something interesting to happen: I'm creating 100 goroutines that all create new table after new table on a 3-node TestCluster. At some point, the cluster seem to just lock up - RPCs don't seem to get responses any more.
To repro, take this commit: andreimatei@b68f602

and run make test PKG=./pkg/sql TESTS='TestParallelDropCreateTables' TESTFLAGS="-v --vmodule=conn_executor=2" TESTTIMEOUT=120s.
After a while, you notice that all logging stops, and the you start seeing stuff like:

W180228 22:20:56.766977 111392 kv/dist_sender.go:1305 [s2,n2] have been waiting 1m0s sending RPC to r7 (currently pending: [(n2,s2):3]) for batch: PushTxn [/Table/SystemConfigSpan/Start,/Min).

Happens reliably to me in 30s or so.

Attaching the timeout stack traces:
stacks.txt

@bdarnell , @petermattis , @nvanbenschoten would any of you be interested in investigating this?

Metadata

Metadata

Assignees

Labels

C-investigationFurther steps needed to qualify. C-label will change.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions