Skip to content

pgwire: deflake TestParseClientProvidedSessionParameters#93538

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
rafiss:fix-TestParseClientProvidedSessionParameters
Dec 15, 2022
Merged

pgwire: deflake TestParseClientProvidedSessionParameters#93538
craig[bot] merged 1 commit intocockroachdb:masterfrom
rafiss:fix-TestParseClientProvidedSessionParameters

Conversation

@rafiss
Copy link
Copy Markdown
Collaborator

@rafiss rafiss commented Dec 13, 2022

fixes #93469

Release note: None

@rafiss rafiss requested review from a team, andreimatei, knz and renatolabs December 13, 2022 19:37
@rafiss rafiss requested a review from a team as a code owner December 13, 2022 19:37
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explain in the commit message why reusing the server wasn't good

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs)

@rafiss rafiss force-pushed the fix-TestParseClientProvidedSessionParameters branch from 762c135 to 01d85f1 Compare December 13, 2022 20:05
@rafiss
Copy link
Copy Markdown
Collaborator Author

rafiss commented Dec 13, 2022

bors r=knz

Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold up, I don't really understand what's going on here. You sure this is the right fix? Looking superficially, it seems to me that these tests ought to be able to share a server.
Is the goroutine doing pgx.Connect(ctx, url) outliving the subtest? If that has something to do with the issue, let's fix that.

Do you mind spelling out to me what's up with these out of order messages? Doesn't each subtest operate on a different session?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs)

Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r-
while we chat

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs)

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 13, 2022

Canceled.

Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @renatolabs)


pkg/sql/pgwire/conn_test.go line 1554 at r2 (raw file):

	for _, tc := range testCases {
		t.Run(tc.desc, func(t *testing.T) {

Resuming from where we left offline:
I had misunderstood what is now being done in every subtest: it's opening a listener, not a crdb node, so I no longer care about doing it repeatedly - in fact I think that's the right move.
But, have we actually solved the problem exposed by the failures in a robust way. If the problem is that pgx opens a connection async (in order to cancel the original connection), then isn't it still possible for that async connection to happen after the next subtest started and trip it up if the listen for the next subtest is assigned the same port?
I think we need to either find a way to drain each subtest, or to make the listener ignore these connections done by pgx. I think you've noticed that the async cancel is related to us putting a connection timeout. Is that timeout necessary?
Alternatively, could we use lib/pq instead?

@rafiss
Copy link
Copy Markdown
Collaborator Author

rafiss commented Dec 14, 2022

If the problem is that pgx opens a connection async (in order to cancel the original connection), then isn't it still possible for that async connection to happen after the next subtest started and trip it up if the listen for the next subtest is assigned the same port?

my read of https://github.com/jackc/pgconn/blob/96b31c42141f1ab3145ad5f72e17c810ea682801/pgconn.go#L535-L540 is that the async cancel is only sent for non-timeout errors. in this case, it's the context deadline exceeded error that's causing this. so let me just try removing the context deadline.

maybe we could also change things so the listener is not closed until after all the subtests are done.

I think we need to either find a way to drain each subtest, or to make the listener ignore these connections done by pgx.

i don't think we can make the listener ignore those connections. it can only identify a cancel request after reading the first few bytes of the request, and by then isn't it already too late?

Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @renatolabs)


pkg/sql/pgwire/conn_test.go line 1554 at r2 (raw file):
Moving your answer back to the right thread.

If the problem is that pgx opens a connection async (in order to cancel the original connection), then isn't it still possible for that async connection to happen after the next subtest started and trip it up if the listen for the next subtest is assigned the same port?

my read of https://github.com/jackc/pgconn/blob/96b31c42141f1ab3145ad5f72e17c810ea682801/pgconn.go#L535-L540 is that the async cancel is only sent for non-timeout errors. in this case, it's the context deadline exceeded error that's causing this. so let me just try removing the context deadline.

👍

maybe we could also change things so the listener is not closed until after all the subtests are done.

Yeah, that's one way to ensure that the port is not reused.

I think we need to either find a way to drain each subtest, or to make the listener ignore these connections done by pgx.

i don't think we can make the listener ignore those connections. it can only identify a cancel request after reading the first few bytes of the request, and by then isn't it already too late?

Too late how? I was thinking the listener would recognize these packets and not hand the connection to getSessionArgs.

pgx sends cancel requests asynchronously (and over a different
connection), so this was confusing the simple listener used in this
test. This also adds synchronization to make sure the pgx connect
function is fully complete before the test ends.

Release note: None
@rafiss rafiss force-pushed the fix-TestParseClientProvidedSessionParameters branch from 01d85f1 to 0b655eb Compare December 14, 2022 19:17
Copy link
Copy Markdown
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @renatolabs)


pkg/sql/pgwire/conn_test.go line 1554 at r2 (raw file):

Too late how? I was thinking the listener would recognize these packets and not hand the connection to getSessionArgs.

ah i understand now. i did that and it seems to have done it. and the fake server can still be shared.

@rafiss rafiss requested a review from andreimatei December 14, 2022 19:18
Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @renatolabs)

@rafiss
Copy link
Copy Markdown
Collaborator Author

rafiss commented Dec 15, 2022

tftr!

bors r=knz,andreimatei

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 15, 2022

Build succeeded:

@craig craig bot merged commit d3a9bfa into cockroachdb:master Dec 15, 2022
@rafiss rafiss deleted the fix-TestParseClientProvidedSessionParameters branch December 15, 2022 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pgwire: flake in TestParseClientProvidedSessionParameters

4 participants