-
Notifications
You must be signed in to change notification settings - Fork 4.1k
lots of table creates lock up cluster. Contention on the SystemConfigSpan? #23254
Description
I'm playing around trying to repro the badness in #22933 and I can get something interesting to happen: I'm creating 100 goroutines that all create new table after new table on a 3-node TestCluster. At some point, the cluster seem to just lock up - RPCs don't seem to get responses any more.
To repro, take this commit: andreimatei@b68f602
and run make test PKG=./pkg/sql TESTS='TestParallelDropCreateTables' TESTFLAGS="-v --vmodule=conn_executor=2" TESTTIMEOUT=120s.
After a while, you notice that all logging stops, and the you start seeing stuff like:
W180228 22:20:56.766977 111392 kv/dist_sender.go:1305 [s2,n2] have been waiting 1m0s sending RPC to r7 (currently pending: [(n2,s2):3]) for batch: PushTxn [/Table/SystemConfigSpan/Start,/Min).
Happens reliably to me in 30s or so.
Attaching the timeout stack traces:
stacks.txt
@bdarnell , @petermattis , @nvanbenschoten would any of you be interested in investigating this?