fix: refactor negotiate loop to fix issue with async callback by maddeleine · Pull Request #5641 · aws/s2n-tls

maddeleine · 2025-11-25T19:18:12Z

Goal

Fix an issue in our async callback code that breaks when multiple handshake messages are sent in the same record. The root cause of this is essentially that we've never needed to exit the negotiate loop with handshake data still waiting to be read. Now that we have new async callbacks that trigger in between reading handshake messages, we are exiting the loop in between handshake messages and the unread handshake messages are being wiped.

Why

Our async callback code that triggers on reading messages will cause the handshake to fail if multiple handshake messages messages are sent in the same record. There are three issues that you run into:

Our s2n_connection_apply_error_blinding is supposed to do nothing if we fail with an async blocked error. Instead it wipes the conn->in stuffer. This wipes the remaining TLS messages in the stuffer if there are more, leading to a handshake failure.
The handle_retry_state function also wipes the conn->in stuffer after the message handler succeeds. This means that again, the remaining TLS messages in the stuffer will be wiped, and the handshake will fail.
After fixing both of the above issues, you run into a third issue where the data in the conn->in stuffer is presumed to be application data, instead of handshake messages.

How

Now the handle_retry_state function will process the remaining messages in the conn->in stuffer before going back to the normal negotiate loop.

Callouts

I think its probably not good that our code has duplicate handshake read/write logic for our async handling. Presumably there should be a way to get the async code to follow the normal negotiate read/write path. That would be a bigger refactor, and it's hard to justify that big of a code change to fix this bug. My fix does move us closer to more code sharing between these two paths though.

Testing

Includes a unit test to show that the situation which failed now succeeds. I removed the #[ignore] tag on the integ tests from #5638 which proves that my change fixes the issue.

                   if (s2n_connection_is_quic_enabled(conn)) {
                       record_type = TLS_HANDSHAKE;
+                      uint8_t message_type = 0;
                       POSIX_GUARD_RESULT(s2n_quic_read_handshake_message(conn, &message_type));

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: refactor negotiate loop to fix issue with async callback#5641

fix: refactor negotiate loop to fix issue with async callback#5641
maddeleine merged 12 commits intoaws:mainfrom
maddeleine:fix_multi_message

maddeleine commented Nov 25, 2025 •

edited

Loading

Uh oh!

maddeleine Nov 26, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

maddeleine commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Goal

Why

How

Callouts

Testing

Related

Uh oh!

maddeleine Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maddeleine commented Nov 25, 2025 •

edited

Loading