Allow adjustment of transport TLS handshake timeout#130909
Allow adjustment of transport TLS handshake timeout#130909DaveCTurner merged 8 commits intoelastic:mainfrom
Conversation
The default 10s TLS handshake timeout may be too short if there is some bug causing event-loop latency, and this has more serious consequences than the underlying performance issue (e.g. it prevents the cluster from scaling up to work around the problem). With this commit we expose a setting that allows the timeout to be configured, providing a workaround in such cases.
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
|
Hi @DaveCTurner, I've created a changelog YAML for you. |
🔍 Preview links for changed docs |
...ain/java/org/elasticsearch/xpack/core/security/transport/netty4/SecurityNetty4Transport.java
Outdated
Show resolved
Hide resolved
| private static final Setting<TimeValue> TRANSPORT_TLS_HANDSHAKE_TIMEOUT_SETTING = Setting.positiveTimeSetting( | ||
| "xpack.security.transport.ssl.handshake_timeout", | ||
| TimeValue.timeValueSeconds(10), | ||
| Setting.Property.NodeScope | ||
| ); |
There was a problem hiding this comment.
IIUC, this will affect more than transport connections. It should at least also apply to RCS 2.0 remote cluster client and likely security realms that initiate outbound TLS connections, e.g. OIDC realm.
Most existing SSL settings are affix settings that apply to different contexts. The transport is one of the contexts. Defining these settings is a somewhat involved process via SSLConfigurationSettings to support contexts.
I think we should either:
- Support this new setting for different contexts as well.
- Dropping the
transportpart from the setting name, i.e.xpack.security.ssl.handshake_timeout, as well as updating the docs to indicate it applies more broadly.
What do you think?
There was a problem hiding this comment.
I think it only affects transport connections, i.e. those which go via SecurityNetty4Transport. That does indeed include remote-cluster connections, but not other outbound TLS connections like the HTTPS ones involved in OIDC. I hadn't noticed that we count RCS2.0 transport connections as distinct from other transport connections in terms of this kind of configuration.
It's a bit tricky tho, I don't really want to have to add support for this setting to all the different contexts in which we do TLS handshakes. At least not today: progress over perfection and all that. If we called it xpack.security.ssl.handshake_timeout then that'd imply it worked everywhere. I'd rather keep it transport-specific, but I think I can see a way to add this to the RCS2.0 settings too.
There was a problem hiding this comment.
Yeah you are right about this does not apply to realms.
The default 10s TLS handshake timeout may be too short if there is some bug causing event-loop latency, and this has more serious consequences than the underlying performance issue (e.g. it prevents the cluster from scaling up to work around the problem). With this commit we expose a setting that allows the timeout to be configured, providing a workaround in such cases.
The default 10s TLS handshake timeout may be too short if there is some bug causing event-loop latency, and this has more serious consequences than the underlying performance issue (e.g. it prevents the cluster from scaling up to work around the problem). With this commit we expose a setting that allows the timeout to be configured, providing a workaround in such cases.
The default 10s TLS handshake timeout may be too short if there is some
bug causing event-loop latency, and this has more serious consequences
than the underlying performance issue (e.g. it prevents the cluster from
scaling up to work around the problem). With this commit we expose a
setting that allows the timeout to be configured, providing a workaround
in such cases.