-
Notifications
You must be signed in to change notification settings - Fork 4.1k
workload: Rate-limit connection warmup #36745
Description
cockroach/pkg/workload/pgx_helpers.go
Lines 87 to 104 in bf399d2
| // "Warm up" the pools so we don't have to establish connections later (which | |
| // would affect the observed latencies of the first requests, especially when | |
| // prepared statements are used). We do this by | |
| // acquiring connections (in parallel), then releasing them back to the | |
| // pool. | |
| var g errgroup.Group | |
| for i, p := range m.Pools { | |
| p := p | |
| conns := warmupConns[i] | |
| for j := range conns { | |
| j := j | |
| g.Go(func() error { | |
| var err error | |
| conns[j], err = p.Acquire() | |
| return err | |
| }) | |
| } | |
| } |
workload eagerly initializes all the connections in its pools, and it does so by spawning new goroutines to initialize all connections in parallel. For large numbers of connections, this is enough to A) trigger SYN cookies DoS protection on the cockroach node and B) sometimes cause the connection to fail with a "connection reset by peer" message (even though the server is up throughout this process. I think this is a consequence of SYN cookies, but I'm not 100% sure). An error at this stage kills the workload regardless of the tolerate-errors flag.
We should add some sort of rate-limiting here so we don't overload the server. We may also want to be more tolerant of errors at this stage (at least if the tolerate-errors flag is set. We should also verify the number of connections required for optimal performance in TPCC. We currently use 2 connections per warehouse which seems really high to me.
This issue manifests as a test failure like
stdout:
Error: read tcp 10.142.0.43:37862->10.142.0.112:26257: read: connection reset by peer
Error: exit status 1
early in a test run (no other output). It appears to be common in roachtests using tpcc-1000, including #35985, #35986, and #36094.