pgwire: deflake TestParseClientProvidedSessionParameters#93538
Conversation
andreimatei
left a comment
There was a problem hiding this comment.
please explain in the commit message why reusing the server wasn't good
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @renatolabs)
762c135 to
01d85f1
Compare
|
bors r=knz |
andreimatei
left a comment
There was a problem hiding this comment.
Hold up, I don't really understand what's going on here. You sure this is the right fix? Looking superficially, it seems to me that these tests ought to be able to share a server.
Is the goroutine doing pgx.Connect(ctx, url) outliving the subtest? If that has something to do with the issue, let's fix that.
Do you mind spelling out to me what's up with these out of order messages? Doesn't each subtest operate on a different session?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @renatolabs)
andreimatei
left a comment
There was a problem hiding this comment.
bors r-
while we chat
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @renatolabs)
|
Canceled. |
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @renatolabs)
pkg/sql/pgwire/conn_test.go line 1554 at r2 (raw file):
for _, tc := range testCases { t.Run(tc.desc, func(t *testing.T) {
Resuming from where we left offline:
I had misunderstood what is now being done in every subtest: it's opening a listener, not a crdb node, so I no longer care about doing it repeatedly - in fact I think that's the right move.
But, have we actually solved the problem exposed by the failures in a robust way. If the problem is that pgx opens a connection async (in order to cancel the original connection), then isn't it still possible for that async connection to happen after the next subtest started and trip it up if the listen for the next subtest is assigned the same port?
I think we need to either find a way to drain each subtest, or to make the listener ignore these connections done by pgx. I think you've noticed that the async cancel is related to us putting a connection timeout. Is that timeout necessary?
Alternatively, could we use lib/pq instead?
my read of https://github.com/jackc/pgconn/blob/96b31c42141f1ab3145ad5f72e17c810ea682801/pgconn.go#L535-L540 is that the async cancel is only sent for non-timeout errors. in this case, it's the context deadline exceeded error that's causing this. so let me just try removing the context deadline. maybe we could also change things so the listener is not closed until after all the subtests are done.
i don't think we can make the listener ignore those connections. it can only identify a cancel request after reading the first few bytes of the request, and by then isn't it already too late? |
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @renatolabs)
pkg/sql/pgwire/conn_test.go line 1554 at r2 (raw file):
Moving your answer back to the right thread.
If the problem is that pgx opens a connection async (in order to cancel the original connection), then isn't it still possible for that async connection to happen after the next subtest started and trip it up if the listen for the next subtest is assigned the same port?
my read of https://github.com/jackc/pgconn/blob/96b31c42141f1ab3145ad5f72e17c810ea682801/pgconn.go#L535-L540 is that the async cancel is only sent for non-timeout errors. in this case, it's the context deadline exceeded error that's causing this. so let me just try removing the context deadline.
👍
maybe we could also change things so the listener is not closed until after all the subtests are done.
Yeah, that's one way to ensure that the port is not reused.
I think we need to either find a way to drain each subtest, or to make the listener ignore these connections done by pgx.
i don't think we can make the listener ignore those connections. it can only identify a cancel request after reading the first few bytes of the request, and by then isn't it already too late?
Too late how? I was thinking the listener would recognize these packets and not hand the connection to getSessionArgs.
pgx sends cancel requests asynchronously (and over a different connection), so this was confusing the simple listener used in this test. This also adds synchronization to make sure the pgx connect function is fully complete before the test ends. Release note: None
01d85f1 to
0b655eb
Compare
rafiss
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @renatolabs)
pkg/sql/pgwire/conn_test.go line 1554 at r2 (raw file):
Too late how? I was thinking the listener would recognize these packets and not hand the connection to getSessionArgs.
ah i understand now. i did that and it seems to have done it. and the fake server can still be shared.
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @renatolabs)
|
tftr! bors r=knz,andreimatei |
|
Build succeeded: |
fixes #93469
Release note: None