Skip to content

Extend default probe connect/handshake timeouts#68059

Merged
DaveCTurner merged 1 commit intoelastic:masterfrom
DaveCTurner:2021-01-27-longer-handshake-timeout
Jan 27, 2021
Merged

Extend default probe connect/handshake timeouts#68059
DaveCTurner merged 1 commit intoelastic:masterfrom
DaveCTurner:2021-01-27-longer-handshake-timeout

Conversation

@DaveCTurner
Copy link
Copy Markdown
Member

Today the discovery phase has a short 1-second timeout for handshaking
with a remote node after connecting, which allows it to quickly move on
and retry in the case of connecting to something that doesn't respond
straight away (e.g. it isn't an Elasticsearch node).

This short timeout was necessary when the component was first developed
because each connection attempt would block a thread. Since #42636 the
connection attempt is now nonblocking so we can apply a more relaxed
timeout.

If transport security is enabled then our handshake timeout applies to
the TLS handshake followed by the Elasticsearch handshake. If the TLS
handshake alone takes over a second then the whole handshake times out
with a ConnectTransportException, but this does not tell us which of
the two individual handshakes took so long.

TLS handshakes have their own 10-second timeout, which if reached yields
a SslHandshakeTimeoutException that allows us to distinguish a problem
at the TLS level from one at the Elasticsearch level. Therefore this
commit extends the discovery probe timeouts.

Today the discovery phase has a short 1-second timeout for handshaking
with a remote node after connecting, which allows it to quickly move on
and retry in the case of connecting to something that doesn't respond
straight away (e.g. it isn't an Elasticsearch node).

This short timeout was necessary when the component was first developed
because each connection attempt would block a thread. Since elastic#42636 the
connection attempt is now nonblocking so we can apply a more relaxed
timeout.

If transport security is enabled then our handshake timeout applies to
the TLS handshake followed by the Elasticsearch handshake. If the TLS
handshake alone takes over a second then the whole handshake times out
with a `ConnectTransportException`, but this does not tell us which of
the two individual handshakes took so long.

TLS handshakes have their own 10-second timeout, which if reached yields
a `SslHandshakeTimeoutException` that allows us to distinguish a problem
at the TLS level from one at the Elasticsearch level. Therefore this
commit extends the discovery probe timeouts.
@DaveCTurner DaveCTurner added >enhancement :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.12.0 labels Jan 27, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team. label Jan 27, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Copy Markdown
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@DaveCTurner DaveCTurner merged commit 9c100cd into elastic:master Jan 27, 2021
@DaveCTurner DaveCTurner deleted the 2021-01-27-longer-handshake-timeout branch January 27, 2021 16:41
DaveCTurner added a commit that referenced this pull request Jan 27, 2021
Today the discovery phase has a short 1-second timeout for handshaking
with a remote node after connecting, which allows it to quickly move on
and retry in the case of connecting to something that doesn't respond
straight away (e.g. it isn't an Elasticsearch node).

This short timeout was necessary when the component was first developed
because each connection attempt would block a thread. Since #42636 the
connection attempt is now nonblocking so we can apply a more relaxed
timeout.

If transport security is enabled then our handshake timeout applies to
the TLS handshake followed by the Elasticsearch handshake. If the TLS
handshake alone takes over a second then the whole handshake times out
with a `ConnectTransportException`, but this does not tell us which of
the two individual handshakes took so long.

TLS handshakes have their own 10-second timeout, which if reached yields
a `SslHandshakeTimeoutException` that allows us to distinguish a problem
at the TLS level from one at the Elasticsearch level. Therefore this
commit extends the discovery probe timeouts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement Team:Distributed Meta label for distributed team. v7.12.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants