Skip to content

fix: refactor negotiate loop to fix issue with async callback#5641

Merged
maddeleine merged 12 commits intoaws:mainfrom
maddeleine:fix_multi_message
Dec 8, 2025
Merged

fix: refactor negotiate loop to fix issue with async callback#5641
maddeleine merged 12 commits intoaws:mainfrom
maddeleine:fix_multi_message

Conversation

@maddeleine
Copy link
Copy Markdown
Contributor

@maddeleine maddeleine commented Nov 25, 2025

Goal

Fix an issue in our async callback code that breaks when multiple handshake messages are sent in the same record. The root cause of this is essentially that we've never needed to exit the negotiate loop with handshake data still waiting to be read. Now that we have new async callbacks that trigger in between reading handshake messages, we are exiting the loop in between handshake messages and the unread handshake messages are being wiped.

Why

Our async callback code that triggers on reading messages will cause the handshake to fail if multiple handshake messages messages are sent in the same record. There are three issues that you run into:

  1. Our s2n_connection_apply_error_blinding is supposed to do nothing if we fail with an async blocked error. Instead it wipes the conn->in stuffer. This wipes the remaining TLS messages in the stuffer if there are more, leading to a handshake failure.
  2. The handle_retry_state function also wipes the conn->in stuffer after the message handler succeeds. This means that again, the remaining TLS messages in the stuffer will be wiped, and the handshake will fail.
  3. After fixing both of the above issues, you run into a third issue where the data in the conn->in stuffer is presumed to be application data, instead of handshake messages.

How

Now the handle_retry_state function will process the remaining messages in the conn->in stuffer before going back to the normal negotiate loop.

Callouts

I think its probably not good that our code has duplicate handshake read/write logic for our async handling. Presumably there should be a way to get the async code to follow the normal negotiate read/write path. That would be a bigger refactor, and it's hard to justify that big of a code change to fix this bug. My fix does move us closer to more code sharing between these two paths though.

Testing

Includes a unit test to show that the situation which failed now succeeds. I removed the #[ignore] tag on the integ tests from #5638 which proves that my change fixes the issue.

Related

release summary: The TLS handshake now succeeds when the async cert callback is configured and peers sent multiple TLS handshake messages per record.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

if (s2n_connection_is_quic_enabled(conn)) {
record_type = TLS_HANDSHAKE;
uint8_t message_type = 0;
POSIX_GUARD_RESULT(s2n_quic_read_handshake_message(conn, &message_type));
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, something I discovered while doing this refactor is that the message_type that gets parsed in this function is never used? It's essentially overwritten later in s2n_read_full_handshake_message. So that's weird...

@maddeleine maddeleine requested a review from jmayclin November 26, 2025 22:45
@maddeleine maddeleine added this pull request to the merge queue Dec 8, 2025
Merged via the queue into aws:main with commit 1c98447 Dec 8, 2025
54 of 55 checks passed
@maddeleine maddeleine deleted the fix_multi_message branch December 8, 2025 18:06
kaukabrizvi pushed a commit to kaukabrizvi/s2n-tls that referenced this pull request Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants