Skip to content

Fix bug where SslHandler may stall after TLSv1.3 handshake with delegate tasks#14411

Merged
chrisvest merged 1 commit intonetty:4.1from
chrisvest:4.1-ssl-handler-tls13-stall-fix
Oct 23, 2024
Merged

Fix bug where SslHandler may stall after TLSv1.3 handshake with delegate tasks#14411
chrisvest merged 1 commit intonetty:4.1from
chrisvest:4.1-ssl-handler-tls13-stall-fix

Conversation

@chrisvest
Copy link
Copy Markdown
Member

Motivation:
When doing a TLSv1.3 handshake with delegate task execution, clients may complete the handshake in a delegate task wrap that does not any data written during the handshake. A subsequent unwrap of server response data is then supposed to let the pending user data flow through to wrapping - the wrapLater flag is tracking this - however, this does not happen because the unwrap notices that the handshake promise has already been completed by the earlier wrap call. The unwrap then assumes, incorrectly, that not only has downstream been notified, but user data has also been wrapped.

Modification:
The unwrap call should always attempt to wrap later, if the handshake is finished or not handshaking, and we have pending user data to wrap.

Result:
No more data processing stalls on the client-side, after a TLSv1.3 handshake with task delegation. This also fixes the frequent test timeouts from SSLEngineTest.mustCallResumeTrustedOnSessionResumption, which was running into this exact scenario.

…ate tasks

Motivation:
When doing a TLSv1.3 handshake with delegate task execution, clients may complete the handshake in a delegate task wrap that does _not_ any data written during the handshake.
A subsequent unwrap of server response data is then supposed to let the pending user data flow through to wrapping - the `wrapLater` flag is tracking this - however, this does not happen because the unwrap notices that the handshake promise has already been completed by the earlier wrap call.
The unwrap then assumes, incorrectly, that not only has downstream been notified, but user data has also been wrapped.

Modification:
The unwrap call should always attempt to wrap later, if the handshake is finished or not handshaking, and we have pending user data to wrap.

Result:
No more data processing stalls on the client-side, after a TLSv1.3 handshake with task delegation.
This also fixes the frequent test timeouts from `SSLEngineTest.mustCallResumeTrustedOnSessionResumption`, which was running into this exact scenario.
@chrisvest chrisvest added this to the 4.1.115.Final milestone Oct 22, 2024
@chrisvest chrisvest merged commit 047bdfe into netty:4.1 Oct 23, 2024
@chrisvest chrisvest deleted the 4.1-ssl-handler-tls13-stall-fix branch October 23, 2024 15:08
chrisvest added a commit that referenced this pull request Oct 23, 2024
…ate tasks (#14411)

Motivation:
When doing a TLSv1.3 handshake with delegate task execution, clients may
complete the handshake in a delegate task wrap that does _not_ any data
written during the handshake. A subsequent unwrap of server response
data is then supposed to let the pending user data flow through to
wrapping - the `wrapLater` flag is tracking this - however, this does
not happen because the unwrap notices that the handshake promise has
already been completed by the earlier wrap call. The unwrap then
assumes, incorrectly, that not only has downstream been notified, but
user data has also been wrapped.

Modification:
The unwrap call should always attempt to wrap later, if the handshake is
finished or not handshaking, and we have pending user data to wrap.

Result:
No more data processing stalls on the client-side, after a TLSv1.3
handshake with task delegation. This also fixes the frequent test
timeouts from `SSLEngineTest.mustCallResumeTrustedOnSessionResumption`,
which was running into this exact scenario.
chrisvest added a commit to chrisvest/netty that referenced this pull request Oct 23, 2024
…ate tasks (netty#14411)

Motivation:
When doing a TLSv1.3 handshake with delegate task execution, clients may
complete the handshake in a delegate task wrap that does _not_ any data
written during the handshake. A subsequent unwrap of server response
data is then supposed to let the pending user data flow through to
wrapping - the `wrapLater` flag is tracking this - however, this does
not happen because the unwrap notices that the handshake promise has
already been completed by the earlier wrap call. The unwrap then
assumes, incorrectly, that not only has downstream been notified, but
user data has also been wrapped.

Modification:
The unwrap call should always attempt to wrap later, if the handshake is
finished or not handshaking, and we have pending user data to wrap.

Result:
No more data processing stalls on the client-side, after a TLSv1.3
handshake with task delegation. This also fixes the frequent test
timeouts from `SSLEngineTest.mustCallResumeTrustedOnSessionResumption`,
which was running into this exact scenario.
chrisvest added a commit that referenced this pull request Oct 23, 2024
Motivation:
Some trust manager implementations produce custom user principals and
store them in the SSLSession. Resumed TLS sessions, however, don't
always recover the stored contents. For instance, with TLSv1.3 stateless
session resumption, our TLS implementations only store the peer
certificate chain (or even just the leaf cert) in the session ticket. In
such a case, the trust manager would like to be notified of the
resumption, so that the peer principal can be recreated and stored in
the session once again.

Modification:
Add a ResumableX509ExtendedTrustManager interface, with callbacks for
resumed client and server sessions. Add infrastructure (the
ResumptionController) to detect session resumption in an SSLEngine
implementation agnostic way; if a TLS handshake completed without any
calls to the TrustManager, then we've made a "stateless resumption"
(hand-wave details) and should call the appropriate resume method. The
JDK APIs don't provide a method to reliably discern if a session has
been resumed, so we instead detect if the trust manager was called
during the handshake. We do this by wrapping user-supplied trust
managers that implement the ResumableX509ExtendedTrustManager interface.
The wrapper will monitor calls to the trust manager and register the
relevant SSLEngine in the controller — the engine is the object that the
trust manager and the SslHandler both have access to, which reliably
identifies the specific TLS session. Before finished the handshake, the
SslHandler queries the ResumptionController to see if the trust manager
has already been called, or if it needs to be notified of resumption.
Since it's possible that the peer certificates have expired since the
initial session, the trust manager gets another change to throw
CertificateException.

Result:
Handlers that expect to find user principals in the authenticated
SSLSessions no longer find empty sessions, provided they use a trust
manager that correctly implements ResumableX509ExtendedTrustManager.

This is a forward port of #14358,
#14404, and
#14411
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants