Skip to content

Connection hang during initialization / enableSSL part deux #2683

@jamey-clari

Description

@jamey-clari

Please read https://stackoverflow.com/help/minimal-reproducible-example

Describe the issue
Earlier this year I filed #2472 which was kindly addressed by @davecramer. Unfortunately it appears that there is still a problem in this area of the code as we still see the issue in our application. I believe the timeout that @davecramer added is probably working BUT the way the VisibleBufferedInputStream.readMore method is working when called from ConnectionFactoryImpl.enableSSL line 553 (int beresp = pgSTream.receiveChar()) it is entering a never-ending loop on reading the socket stream. The current fix calls pgStream.getSocket().setSoTimeout(sslTimeout) prior to attempting to read the control character however, unlike calling PgStream.setNetworkTimeout(int), the setTimeoutRequested flag is never set on the VisibleBufferedInputStream. Because of this, when the SocketTimeoutException occurs on the wrapped.read call in VisibileBufferedInputStream.readMore (line 161), no action is taken in the catch block. Both block and timeoutRequested are false so the code continues out of the catch block and returns true after a no-op update to endIndex. This causes it to remain in the while loop in VisibleBufferedInputStream.ensureBytes (line 127) retrying. Such is my theory anyway as the alternative is that calling setSoTimeout on the socket isn't having any effect which would be pretty odd.

Thanks for your consideration!

Driver Version?
42.4.2

Java Version?
openjdk version "1.8.0_312"
OpenJDK Runtime Environment Corretto-8.312.07.1 (build 1.8.0_312-b07)
OpenJDK 64-Bit Server VM Corretto-8.312.07.1 (build 25.312-b07, mixed mode)

OS Version?
Ubuntu 20.04.3 LTS

PostgreSQL Version?
AWS Aurora 12.11

To Reproduce
Steps to reproduce the behaviour:
Challenging, as described in #2472 - intermittently occurs when master node of cluster changes and our connection pools reset.

Expected behaviour
A clear and concise description of what you expected to happen.
And what actually happens

We expect the SSL negotiation to timeout after 5 seconds (by default) and the attempt to connect to throw after the timeout is encountered. Currently the attempt to connect blocks indefinitely and does not return.

Logs
If possible PostgreSQL logs surrounding the occurrence of the issue
Additionally logs from the driver can be obtained adding

Using the following template code make sure the bug can be replicated in the driver alone.


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions