pgwire: remove readTimeoutConn in favor of a channel#124373
pgwire: remove readTimeoutConn in favor of a channel#124373craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
ce2e425 to
17e388a
Compare
yuzefovich
left a comment
There was a problem hiding this comment.
Thanks for looking into this! The change overall seems good to me.
I'm not too familiar with this code, so I want to double check my understanding. IIUC currently for each connection we have two goroutines:
- the "reader" goroutine that continuously loops in
serveImpland see what the client is requesting us to do - and the "processor" goroutine for the connExecutor that executes commands added into
stmtBufby the "reader" goroutine.
This PR introduces a third "watchdog" goroutine that monitors the state for cancellation and draining. In other words, we make a trade-off between 1) periodically forcing net.Conn to wake up in the "reader" due to a read deadline to see whether the "reader" should quit and 2) a separate goroutine that only monitors things with no read deadline set for the "reader". In practice, this "watchdog" goroutine should be idle most of the time, so it should remove some system calls. Does this sound right?
| @@ -945,6 +942,35 @@ func (s *Server) serveImpl( | |||
| authOpt authOptions, | |||
| sessionID clusterunique.ID, | |||
| ) { | |||
There was a problem hiding this comment.
nit: the last part of the comment for serveImpl needs an update.
pkg/sql/pgwire/server.go
Outdated
| tcpConn, _ = underConn.(*net.TCPConn) | ||
| } | ||
| if tcpConn == nil { | ||
| _ = c.conn.Close() |
There was a problem hiding this comment.
We didn't close the connection in both directions before this patch. Based on the comment to serveImpl, it seems that its the connExecutor's job to do so. Why add this here?
There was a problem hiding this comment.
hm good point. it seems safer to only close the read side.
this matches the previous behavior, where if the context was cancelled, then the readTimeoutConn would start returning errors when Read is called:
cockroach/pkg/sql/pgwire/server.go
Line 1039 in ceb19c5
that should cause the main loop of the reader goroutine to exit here:
cockroach/pkg/sql/pgwire/server.go
Lines 1216 to 1225 in ceb19c5
There was a problem hiding this comment.
although, thinking about it more, the risk is that if the type switch no longer works as expected (like for example, there's a new implementation of net.Conn to deal with later), then not having a default case would make the connection hang forever.
maybe it's safer to keep the Close as a fallback? WDYT?
| // DrainRequest. This will make the processor quit whenever it finds a | ||
| // good time. | ||
| _ /* err */ = c.stmtBuf.Push(ctx, sql.DrainRequest{}) | ||
| <-ctx.Done() |
There was a problem hiding this comment.
nit: a comment (that eventually the drain request will be processed by the server and all connections will be closed, which is achieved by cancelling the context) would be helpful.
rafiss
left a comment
There was a problem hiding this comment.
This PR introduces a third "watchdog" goroutine that monitors the state for cancellation and draining. In other words, we make a trade-off between 1) periodically forcing net.Conn to wake up in the "reader" due to a read deadline to see whether the "reader" should quit and 2) a separate goroutine that only monitors things with no read deadline set for the "reader". In practice, this "watchdog" goroutine should be idle most of the time, so it should remove some system calls. Does this sound right?
yes sounds right. it's worth noting that a goroutine that does nothing apart from wait on a channel is very efficient in go. the scheduler should never try to run it.
| @@ -945,6 +942,35 @@ func (s *Server) serveImpl( | |||
| authOpt authOptions, | |||
| sessionID clusterunique.ID, | |||
| ) { | |||
| // DrainRequest. This will make the processor quit whenever it finds a | ||
| // good time. | ||
| _ /* err */ = c.stmtBuf.Push(ctx, sql.DrainRequest{}) | ||
| <-ctx.Done() |
pkg/sql/pgwire/server.go
Outdated
| tcpConn, _ = underConn.(*net.TCPConn) | ||
| } | ||
| if tcpConn == nil { | ||
| _ = c.conn.Close() |
There was a problem hiding this comment.
hm good point. it seems safer to only close the read side.
this matches the previous behavior, where if the context was cancelled, then the readTimeoutConn would start returning errors when Read is called:
cockroach/pkg/sql/pgwire/server.go
Line 1039 in ceb19c5
that should cause the main loop of the reader goroutine to exit here:
cockroach/pkg/sql/pgwire/server.go
Lines 1216 to 1225 in ceb19c5
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewed 4 of 4 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @rafiss)
pkg/sql/pgwire/server.go line 968 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
although, thinking about it more, the risk is that if the type switch no longer works as expected (like for example, there's a new implementation of net.Conn to deal with later), then not having a default case would make the connection hang forever.
maybe it's safer to keep the Close as a fallback? WDYT?
Sounds reasonable to me.
I traced where Close is called currently, and I think it's a few layers up in the defer in in the async task in netutil.TCPServer.ServeWith, so the comment (that "serveImpl always closes the network connection before returning") is misleading before your patch but will actually become closer to reality after the patch. It might be worth adjusting the comment.
yuzefovich
left a comment
There was a problem hiding this comment.
Looks like the updates haven't been pushed yet?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @rafiss)
rafiss
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/pgwire/server.go line 968 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Sounds reasonable to me.
I traced where
Closeis called currently, and I think it's a few layers up in thedeferin in the async task innetutil.TCPServer.ServeWith, so the comment (that "serveImpl always closes the network connection before returning") is misleading before your patch but will actually become closer to reality after the patch. It might be worth adjusting the comment.
I was about to push the change, then realized that another fallback option is to call conn.SetReadDeadline(now()), which might be less disruptive. I'll do that instead.
17e388a to
d7150f9
Compare
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @rafiss)
pkg/sql/pgwire/server.go line 968 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
I was about to push the change, then realized that another fallback option is to call
conn.SetReadDeadline(now()), which might be less disruptive. I'll do that instead.
Nice, I like it.
Rather than using a connection that polls for the context being done every second, we now spin up an additional goroutine that blocks until the connection context is done, or the drain signal was received. Release note: None
This benchmark was only useful for capturing a CPU profile, it doesn't measure anything else. Release note: None
d7150f9 to
1351385
Compare
|
tftr! bors r=yuzefovich |
|
Build failed: |
|
bors r=yuzefovich |
Rather than using a connection that polls for the context being done every second, we now spin up an additional goroutine that blocks until the connection context is done, or the drain signal was received.
I wrote a simple benchmark of the idle connection loop to generate CPU profiles.
With the old readTimeoutConn:

With the new goroutine approach:

There's definitely less noise and overhead.
fixes #25585
Release note: None